COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though medical records are abundant, they are largely inaccessible to outside researchers. Statistical, machine learning, and causal research are most successful with large datasets beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many clinical centers to reveal patterns in COVID-19 patients. To create N3C, the community had to overcome technical, regulatory, policy, and governance barriers to sharing patient-level clinical data. In less than 2 months, we developed solutions to acquire and harmonize data across organizations and created a secure data environment to enable transparent and reproducible collaborative research.
We expect the N3C to help save lives by enabling collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care needs and thereby reduce the immediate and long-term impacts of COVID-19.
The onset of the COVID-19 pandemic has provided a unique opportunity to leverage CD2H resources and expertise already well-established within the informatics and medical communities. Most of the CD2H projects have been incorporated into N3C efforts: Text Analytics/Natural Language Processing is most active in N3C led by Hongfang Liu; Machine Learning is now applied to N3C on the Palantir platform led by Heidi Spratt and Peter Robinson; Data Quality has become entirely focused as a component of the harmonization pipeline in partnership with OHDSI, and as post-processing in the analytics environment. CD2H leaders have an integral role in creating the infrastructure of the N3C to synergize discoveries and meet the emerging needs brought to light by the novel coronavirus.