NIH RECOVER
The electronic health record (EHR) data in the N3C Enclave has been a valuable source of information for understanding COVID-19. This same data is now also playing a role in the NIH initative, Researching COVID to Enhance Recovery (RECOVER), which is addressing the urgent need to understand Post-acute sequelae of SARS-CoV-2 infection (PASC), otherwise known as Long COVID.
Understanding Long COVID using N3C data
This page displays data from the manuscript, Who has long-COVID? A big data approach. This study applies machine learning techinques to N3C data to identify features of the EHR data that are predictive of Long COVID. Figures from this manuscript have been adapted to a dashboard format so that viewers can follow the findings based on the latest data.
This is a living page so please return for the latest Long COVID information.
Limitations of this data and further information about how the models were developed can be found in the N3C Long COVID manuscript.
Most important model features for predicting visit to a Long-COVID clinic.
Shown are the relative feature importance and univariate odds ratios for the top features (union of the 20 most important features) in each model. Odds ratios exclude age, which has a non-linear relationship with Long COVID. Regardless of importance, some features are significantly more prominent in the Long COVID clinic population, while others are more prominent in the non-Long COVID clinic population. A tilde (~) denotes that the feature was not in the top 20 features for the model in that column. Conditions labelled “chronic” were associated with patients prior to their COVID-19 index.
Characteristics of the three-site cohort used for model training and testing.
|
Long COVID clinic |
Not Long COVID clinic |
|
Hospitalized |
Not Hospitalized |
Hospitalized |
Not Hospitalized |
Age (mean (SD)) |
58.29 y (SD: 15.03%) |
48.14 y (SD: 14.02%) |
56.50 y (SD: 18.60%) |
45.92 y (SD: 17.32%) |
|
n = 428 (%) |
n = 169 (%) |
n = 15,193 (%) |
n = 58,182 (%) |
Sex |
Female |
237 (55.4%) |
127 (75.1%) |
8,465 +/-5 (55.7%) |
34,771 (59.8%) |
Male |
191 (44.6%) |
42 (24.9%) |
6,716 +/-5 (44.2%) |
23,258 (40.0%) |
Unknown |
0 (0.0%) |
0 (0.0%) |
<20 |
153 (0.3%) |
Race |
Asian |
<20 |
<20 |
571 (3.8%) |
1361 (2.3%) |
Black |
190 (44.4%) |
31 (18.3%) |
3,207 (21.1%) |
6,370 (10.9%) |
Native Haw./Pac. Islander |
<20 |
<20 |
43 (0.3%) |
138 (0.2%) |
Other |
<20 |
<20 |
85 (0.6%) |
254 (0.4%) |
Unknown |
81 (18.9%) |
27 (16.0%) |
2,695 (17.7%) |
9,842 (16.9%) |
White |
142 (33.2%) |
107 (63.3%) |
8,592 (56.6%) |
40,217 (69.1%) |
Ethnicity |
Hispanic/Latino |
69 +/-5 (16.1%) |
26 +/-5 (15.4%) |
3,064 (20.2%) |
11,416 (19.6%) |
Not Hispanic/Latino |
354 +/-5 (82.7%) |
128 +/-5 (75.7%) |
11,869 (78.1%) |
45,119 (77.5%) |
Unknown |
<20 |
<20 |
260 (1.7%) |
1,647 (2.8%) |
Age Group |
18-25 |
<20 |
<20 |
790 (5.2%) |
7,573 (13.0%) |
26-45 |
86 +/-5 (20.1%) |
75 (44.4%) |
3,824 (25.2%) |
22,732 (39.1%) |
46-65 |
188 +/-5 (43.9%) |
69 (40.8%) |
5,249 (34.5%) |
19,015 (32.7%) |
66+ |
147 +/-5 (34.3%) |
<20 |
5,330 (35.1%) |
8,862 (15.2%) |
Pre-COVID comorbidities |
Diabetes |
86 (20.1%) |
<20 |
2,412 (15.9%) |
4,842 (8.3%) |
Chronic kidney disease |
70 (16.4%) |
<20 |
1,721 (11.3%) |
2,272 (3.9%) |
Congestive heart failure |
48 (11.2%) |
<20 |
960 (6.3%) |
1,133 (1.9%) |
Chronic pulmonary disease |
45 (10.5%) |
29 (17.2%) |
1,415 (9.3%) |
3,698 (6.4%) |
All patients shown had acute COVID-19. In accordance with the N3C download policy, for demographics where small cell sizes (<20 patients) could be derived from context, we have shifted the counts +/- by a random number between 1 and 5. The accompanying percentages reflect the shifted number. All shifted counts are labelled as such, e.g. +/- 5.
Demographic breakdown of potential Long-COVID patients in the N3C cohort.
|
Potential Long COVID |
Not Potential Long COVID |
All Patients |
|
n = 78,990 |
n = 767,991 |
n = 846,981 |
Sex |
Female |
127 (75.1%) |
8,465 +/-5 (55.7%) |
237 (55.4%) |
Male |
26,513(33.6%) |
314,471 (40.9%) |
340,984 (40.3%) |
Unknown |
586 (0.6%) |
4,880 (0.6%) |
5,466 (0.7%) |
Race |
Asian |
<20 |
20,453 (2.7%) |
1,906(2.4%) |
Black |
13,575 (17.2%) |
121,341 (15.8%) |
134,916 (15.9%) |
Native Haw./Pac. Islander |
194 (0.2%) |
2,267(0.3%) |
2,461 (0.3%) |
Other |
1,240 (1.6%) |
12,939(1.7%) |
14,179 (1.7%) |
Unknown |
10,183 (12.9%) |
112,605 (14.7%) |
122,788 (14.5%) |
White |
51,892 (65.7%) |
498,386(64.9%) |
550,278 (65.0%) |
Ethnicity |
Hispanic/Latino |
9,733(12.3%) |
115,321 (15%) |
125,054 (14.8%) |
Not Hispanic/Latino |
63,960 (81%) |
595,119 (77.5%) |
659,079 (77.8%) |
Unknown |
5,297 (6.7%) |
57,551 (7.5%) |
62,848 (7.4%) |
Age Group |
18-25 |
99,573 (13%) |
790 (5.2%) |
659(0.8%) |
26-45 |
13,577 (17.2%) |
265,948 (34.6%) |
279,525 (33.0%) |
46-65 |
37,028 (46.9%) |
255,417 (33.3%) |
292,445 (34.5%) |
66+ |
27,726 (35.1%) |
147,053 (19.1%) |
174,779 (20.6%) |
The trained “all patients'' model was run on the base population of COVID-19 patients within the N3C Enclave (n = 846,981) with a predicted probability threshold set at 0.45 to emphasize recall. Age, sex, race, and ethnicity breakdowns of these patients are shown here for all patients as well as broken down by +/- potential Long COVID.