N3C LogoN3C Cohort Exploration

Ready to dive into the data? View and analyze data in our secure N3C Data Enclave. The data include harmonized de-identified information from electronic health records. The Data Enclave is open to academic researchers, clinicians, and citizen scientists. Register for an account now!

You are encouraged to submit suggestions for enhancements/additions to this tracking issue.


N3C Data Enclave Statistics

Release Set:

Production version:  



COVID+ Cases:  

Total Number of Rows:  

Clinical Observations:  

Lab Results:  

Medication Records:  



For more information about the release set, please visit the N3C Data Ingestion & Harmonization GitHub Repository.

N3C Contributing Sites

Age and Sex Distributions of N3C Cohort

Lab-confirmed Negative
Lab-confirmed Positive
Suspected COVID

Race and Ethnicity Distributions of N3C Cohort

Lab-confirmed Negative
(NHPI - Native Hawaiian or Other Pacific Islander)
Lab-confirmed Positive
(NHPI - Native Hawaiian or Other Pacific Islander)
(NHPI - Native Hawaiian or Other Pacific Islander)
Suspected COVID
(NHPI - Native Hawaiian or Other Pacific Islander)
Comorbidity Distribution of COVID+ in N3C Cohort
Vaccine Data

This table shows the number of patients who have received a COVID-19 vaccination within the N3C Data Enclave by the vaccine manufacturer. For more information on the vaccines themselves, please visit the FDA's COVID-19 Vaccine Information Page.

Medications Table

Predicting Clinical Severity

N3C researchers used state-of-the-art Data Science and statistical methods to examine N3C data about patients' tests and vitals on the first day they visited the hospital. The patterns and relationships this analysis is uncovering aim to help doctors identify which patients are at highest risk of severe outcomes from COVID-19; below are the most important factors that were identified (0 = most important, 63 least important). Having this information can help doctors and caregivers adjust their care decisions and lead to better patient outcomes.

To demonstrate the utility of the N3C cohort for analytics, several machine learning (ML) models were developed that accurately predict a severe clinical course. All models were developed using data associated with the first calendar day of a patient's hospital encounter, aggregated from 34 medical centers nationwide. To ensure that outcomes would be represented in the data, only patients with at least one hospital overnight were included, all laboratory-confirmed positive (n ≈ 32,000). Randomly selected training (70%) and testing (30%) sets were stratified by outcome proportions, and potential predictors were present for at least 15% of the training set. Input variables are the most abnormal value on the first calendar day of the hospital encounter. When patients did not have a laboratory test value on the first calendar day, normal values for specialized labs (e.g. ferritin, procalcitonin) and the median cohort value for common labs (e.g. sodium, albumin) were imputed.

"Severe Clinical Course" was coded in the presence
of any of the following outcomes:

Hospitalization with death
Discharge to hospice
Invasive mechanical ventilation
Extracorporeal membrane oxygenation (ECMO)

Models Include:

Random Forest (AUROC 0.86)
XGBoost (AUROC 0.87)
Logistic Regression
Logistic Regression w/L1 Penalty
Logistic Regression w/L2 Penalty
Logistic Regression w/Elastic Net Regularization
Ridge Classifier

For more information, please see The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction

Variable Importance Table: 64 metrics and their importance rank in predicting if a patient would have a severe response to COVID-19 (0 = most important, 63 = least important) are shown below. Each column represents a different model built to predict severity. The color of the cell corresponds to each metric's rank, with darkest (blue) representing the highest importance and lightest (teal) representing lower importance.

Engagement and Registration Statistics
Institutions with Executed DUAs253
DUA Institutions with Registered Users220
Institutions with Executed DTAs89
Total User Registrations2953
Users Requesting Enclave Access2038
Registered Enclave Users From DUA Institutions1883
Data Transfer Agreements
Data Use Agreements
Domain Team Roster
Project Roster
Journal Articles
bioRxiv/medRxiv Preprints
Other Public Venues (Podium Presentations, Posters, etc.)