RECOVER Long COVID

Expanding knowledge of Long COVID using N3C data

NIH RECOVER

The electronic health record (EHR) data in the N3C Enclave has been a valuable source of information for understanding COVID-19. This same data is now also playing a role in the NIH initative, Researching COVID to Enhance Recovery (RECOVER), which is addressing the urgent need to understand Post-acute sequelae of SARS-CoV-2 infection (PASC), otherwise known as Long COVID.

Understanding Long COVID using N3C data

This page displays data from the manuscript, Who has long-COVID? A big data approach. This study applies machine learning techinques to N3C data to identify features of the EHR data that are predictive of Long COVID. Figures from this manuscript have been adapted to a dashboard format so that viewers can follow the findings based on the latest data.

This is a living page so please return for the latest Long COVID information.

Limitations of this data and further information about how the models were developed can be found in the N3C Long COVID manuscript.

Most important model features for predicting visit to a Long-COVID clinic.

How to read these plots

Feature

Qualifying Non-hospitalized Patients

Importance

Odds Ratio (95% CI)

Qualifying Hospitalized Patients

Importance

Odds Ratio (95% CI)

All Qualifying Patients

Importance

Odds Ratio (95% CI)

post-COVID outpatient utilization

difficulty breathing (dx)

age

see discussion

dyspnea (dx)

male sex

COVID vaccine (med)

post-COVID inpatient utilization

oxycodone (med)

cough (dx)

prednisone (med)

arthralgia of pelvis/thigh (dx)

deficiency of micronutrients (dx)

polyethylene glycol 3350 (med)

albuterol (med)

dyssomnia (dx)

preexisting chronic pulm. dis.

ketorolac (med)

flumazenil (med)

vitamin d deficiency (dx)

metabolic disease (dx)

dexamethasone (med)

hyperlipidemia (dx)

pain of truncal structure (dx)

preexisting diabetes

metoprolol (med)

preexisting chronic kidney dis.

pain (dx)

naloxone (med)

backache (dx)

guaifenesin (med)

ondansetron (med)

melatonin (med)

hospitalized for COVID

propofol (med)

Shown are the relative feature importance and univariate odds ratios for the top features (union of the 20 most important features) in each model. Odds ratios exclude age, which has a non-linear relationship with Long COVID. Regardless of importance, some features are significantly more prominent in the Long COVID clinic population, while others are more prominent in the non-Long COVID clinic population. A tilde (~) denotes that the feature was not in the top 20 features for the model in that column. Conditions labelled “chronic” were associated with patients prior to their COVID-19 index.

	Long COVID clinic		Not Long COVID clinic
	Hospitalized	Not Hospitalized	Hospitalized	Not Hospitalized
Age (mean (SD))	58.29 y (SD: 15.03%)	48.14 y (SD: 14.02%)	56.50 y (SD: 18.60%)	45.92 y (SD: 17.32%)
	n = 428 (%)	n = 169 (%)	n = 15,193 (%)	n = 58,182 (%)
Sex
Female	237 (55.4%)	127 (75.1%)	8,465 +/-5 (55.7%)	34,771 (59.8%)
Male	191 (44.6%)	42 (24.9%)	6,716 +/-5 (44.2%)	23,258 (40.0%)
Unknown	0 (0.0%)	0 (0.0%)	<20	153 (0.3%)
Race
Asian	<20	<20	571 (3.8%)	1361 (2.3%)
Black	190 (44.4%)	31 (18.3%)	3,207 (21.1%)	6,370 (10.9%)
Native Haw./Pac. Islander	<20	<20	43 (0.3%)	138 (0.2%)
Other	<20	<20	85 (0.6%)	254 (0.4%)
Unknown	81 (18.9%)	27 (16.0%)	2,695 (17.7%)	9,842 (16.9%)
White	142 (33.2%)	107 (63.3%)	8,592 (56.6%)	40,217 (69.1%)
Ethnicity
Hispanic/Latino	69 +/-5 (16.1%)	26 +/-5 (15.4%)	3,064 (20.2%)	11,416 (19.6%)
Not Hispanic/Latino	354 +/-5 (82.7%)	128 +/-5 (75.7%)	11,869 (78.1%)	45,119 (77.5%)
Unknown	<20	<20	260 (1.7%)	1,647 (2.8%)
Age Group
18-25	<20	<20	790 (5.2%)	7,573 (13.0%)
26-45	86 +/-5 (20.1%)	75 (44.4%)	3,824 (25.2%)	22,732 (39.1%)
46-65	188 +/-5 (43.9%)	69 (40.8%)	5,249 (34.5%)	19,015 (32.7%)
66+	147 +/-5 (34.3%)	<20	5,330 (35.1%)	8,862 (15.2%)
Pre-COVID comorbidities
Diabetes	86 (20.1%)	<20	2,412 (15.9%)	4,842 (8.3%)
Chronic kidney disease	70 (16.4%)	<20	1,721 (11.3%)	2,272 (3.9%)
Congestive heart failure	48 (11.2%)	<20	960 (6.3%)	1,133 (1.9%)
Chronic pulmonary disease	45 (10.5%)	29 (17.2%)	1,415 (9.3%)	3,698 (6.4%)

	Potential Long COVID	Not Potential Long COVID	All Patients
	n = 78,990	n = 767,991	n = 846,981
Sex
Female	127 (75.1%)	8,465 +/-5 (55.7%)	237 (55.4%)
Male	26,513(33.6%)	314,471 (40.9%)	340,984 (40.3%)
Unknown	586 (0.6%)	4,880 (0.6%)	5,466 (0.7%)
Race
Asian	<20	20,453 (2.7%)	1,906(2.4%)
Black	13,575 (17.2%)	121,341 (15.8%)	134,916 (15.9%)
Native Haw./Pac. Islander	194 (0.2%)	2,267(0.3%)	2,461 (0.3%)
Other	1,240 (1.6%)	12,939(1.7%)	14,179 (1.7%)
Unknown	10,183 (12.9%)	112,605 (14.7%)	122,788 (14.5%)
White	51,892 (65.7%)	498,386(64.9%)	550,278 (65.0%)
Ethnicity
Hispanic/Latino	9,733(12.3%)	115,321 (15%)	125,054 (14.8%)
Not Hispanic/Latino	63,960 (81%)	595,119 (77.5%)	659,079 (77.8%)
Unknown	5,297 (6.7%)	57,551 (7.5%)	62,848 (7.4%)
Age Group
18-25	99,573 (13%)	790 (5.2%)	659(0.8%)
26-45	13,577 (17.2%)	265,948 (34.6%)	279,525 (33.0%)
46-65	37,028 (46.9%)	255,417 (33.3%)	292,445 (34.5%)
66+	27,726 (35.1%)	147,053 (19.1%)	174,779 (20.6%)

RECOVER Long COVID

NIH RECOVER

Understanding Long COVID using N3C data

Most important model features for predicting visit to a Long-COVID clinic.

Characteristics of the three-site cohort used for model training and testing.

Demographic breakdown of potential Long-COVID patients in the N3C cohort.