Expanding knowledge of Long COVID using N3C data


The electronic health record (EHR) data in the N3C Enclave has been a valuable source of information for understanding COVID-19. This same data is now also playing a role in the NIH initative, Researching COVID to Enhance Recovery (RECOVER), which is addressing the urgent need to understand Post-acute sequelae of SARS-CoV-2 infection (PASC), otherwise known as Long COVID.

Understanding Long COVID using N3C data

This page displays data from the manuscript, Who has long-COVID? A big data approach. This study applies machine learning techinques to N3C data to identify features of the EHR data that are predictive of Long COVID. Figures from this manuscript have been adapted to a dashboard format so that viewers can follow the findings based on the latest data.

This is a living page so please return for the latest Long COVID information.

Limitations of this data and further information about how the models were developed can be found in the N3C Long COVID manuscript.

Most important model features for predicting visit to a Long-COVID clinic.

How to read these plots

Shown are the relative feature importance and univariate odds ratios for the top features (union of the 20 most important features) in each model. Odds ratios exclude age, which has a non-linear relationship with Long COVID. Regardless of importance, some features are significantly more prominent in the Long COVID clinic population, while others are more prominent in the non-Long COVID clinic population. A tilde (~) denotes that the feature was not in the top 20 features for the model in that column. Conditions labelled “chronic” were associated with patients prior to their COVID-19 index.