Ready to dive into the data? View and analyze data in our secure N3C Data Enclave. The data include harmonized de-identified information from electronic health records. The Data Enclave is open to academic researchers, clinicians, and citizen scientists. Register for an account now!
You are encouraged to submit suggestions for enhancements/additions to this tracking issue.
Predicting Clinical Severity
N3C researchers used state-of-the-art Data Science and statistical methods to examine N3C data about patients' tests and vitals on the first day they visited the hospital. The patterns and relationships this analysis is uncovering aim to help doctors identify which patients are at highest risk of severe outcomes from COVID-19; below are the most important factors that were identified (0 = most important, 63 least important). Having this information can help doctors and caregivers adjust their care decisions and lead to better patient outcomes.
To demonstrate the utility of the N3C cohort for analytics, several machine learning (ML) models were developed that accurately predict a severe clinical course. All models were developed using data associated with the first calendar day of a patient's hospital encounter, aggregated from 34 medical centers nationwide. To ensure that outcomes would be represented in the data, only patients with at least one hospital overnight were included, all laboratory-confirmed positive (n ≈ 32,000). Randomly selected training (70%) and testing (30%) sets were stratified by outcome proportions, and potential predictors were present for at least 15% of the training set. Input variables are the most abnormal value on the first calendar day of the hospital encounter. When patients did not have a laboratory test value on the first calendar day, normal values for specialized labs (e.g. ferritin, procalcitonin) and the median cohort value for common labs (e.g. sodium, albumin) were imputed.
"Severe Clinical Course" was coded in the presence
of any of the following outcomes:
Hospitalization with death Discharge to hospice Invasive mechanical ventilation Extracorporeal membrane oxygenation (ECMO)
Random Forest (AUROC 0.86)
XGBoost (AUROC 0.87)
Logistic Regression Logistic Regression w/L1 Penalty Logistic Regression w/L2 Penalty Logistic Regression w/Elastic Net Regularization Ridge Classifier
For more information, please see The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction
Variable Importance Table: 64 metrics and their importance rank in predicting if a patient would have a severe response to COVID-19 (0 = most important, 63 = least important) are shown below. Each column represents a different model built to predict severity. The color of the cell corresponds to each metric's rank, with darkest (blue) representing the highest importance and lightest (teal) representing lower importance.