Phenotype & Data Acquisition Workstream


Identify patients and controls by establishing a common COVID-19 phenotype that will define the data pull for the limited data set, create a "white glove" service to obtain data from each site by building easily adaptable scripts for each clinical data model, and ingest data into a secure location, per approved institutional agreements.

View the latest phenotype on our GitHub Wiki.

View our documentation and submission instructions for all four data models.

Dual-purpose workstream:

  1. Work with the community to write and maintain a computable phenotype for COVID-19.
  2. Write and maintain a series of scripts to execute the computable phenotype in each of four common data models (CDMs): OMOP, ACT, PCORnet, and TriNetX. Provide support for sites as they execute these scripts and transmit data.

Office hours are held every other Thursday from 9:00-10:00 am PT/12:00-1:00 pm ET starting June 10, 2021 to answer data sharing questions and address issues. (Regulatory paperwork does not have to be complete to attend.) Register Here.

Connect with Us:

  1. Onboard to N3C using the link below.
    • In there you will provide your email address. We will add that email address to the CD2H workspace.
  2. Go to our workstream Slack channel directly using the link provided below.
    • Login with your Slack credentials.


It is preferable and highly encouraged for sites to use the N3C COVID-19 phenotype to identify their population and ensure a consistent cohort definition across the N3C dataset. Due to the importance of consistency, the Phenotype Workstream has developed “plug and play” code to define the phenotype. However, if a site chooses to use a locally defined phenotype for COVID-19, the site will need to provide its cohort definition in a machine-readable format and provide updates of the custom cohort definition if and when it changes. Likewise, any issues in format, either syntactic or semantic, incurred during the ingestion process will be the site’s responsibility to resolve. Sites that plan to use a local COVID-19 phenotype should notify the Phenotype Workstream via a GitHub issue or during Harmonization/Phenotype Office Hours for assistance with the process.

No; the N3C can accept data in their native common data model (CDM) format, whether it be OMOP, PCORnet, ACT, or TriNetX. Code is available in the Phenotype GitHub repository and documentation is available in the Phenotype GitHub Wiki. Staff from sites using OMOP, PCORnet, or ACT data models will run N3C’s code to define their cohort and extract data elements from their datamart. Sites using TriNetX will work with their TriNetX representative to support the data extraction.

Data ingestion and harmonization will be an iterative process between the Phenotype & Data Acquisition workstream, the Data Ingestion & Harmonization workstream, and contributing sites. First and foremost, the best way to support the data ingestion and harmonization process is to ensure the data payload follows the specifications outlined in the GitHub Wiki documentation. Use of the R or Python Exporter will support consistency while lowering site burden.

When the code changes in a way that will affect sites, an email will be sent to the n3c-tech-team listserv. Click here to apply for membership to this Google group.

The phenotype for COVID-19 continues to evolve as more is learned about the disease and new codes (particularly LOINC) become available. The phenotype is updated approximately every 2 weeks on Fridays, with corresponding code updated by the following Tuesday. The latest version of the phenotype can be found in the Phenotype Github repository.

Data will be transferred to NCATS via SFTP. To obtain SFTP credentials after IRB approval has been granted and the DTA has been signed, please contact Emily Pfaff at

The Wiki documentation outlines the expected data output format, naming conventions, and directory structure. If a site is using the R or Python exporter, all of these steps will be included in those exporter packages. If a site is using the raw SQL, the Wiki should be reviewed for specifications and examples for packaging the data (OMOP, PCORnet, ACT).

Ideally, sites will transmit data to N3C 1-2 times per week. If this refresh rate is not feasible for a site, the Phenotype workstream will provide assistance.

If a site uses its common data model for other purposes (particularly network studies) and follows the guidelines for populating it, it will most likely have the data it needs for the initial phases of N3C. As time goes on, the N3C consortium may agree on additional data elements to add that are not generally found in CDMs, or are infrequently populated.

All sites are strongly encouraged to use the R and Python Exporters that the Phenotype & Data Acquisition workstream has created. This will ensure uniform data exports, which is critical to the data harmonization process. Moreover, the R and Python scripts automate many tasks that would otherwise be manual.

Alternatively, sites may choose to use raw SQL scripts. If a site chooses this option, they should closely review and follow the export specifications outlined in the documentation (OMOP, PCORnet, ACT).

If you are working with TriNetX, TriNetX will export your site’s data on your site’s behalf; you will not need to use any of the code (exporters or phenotype).

The Phenotype workstream can provide assistance. If support is needed during any part of the phenotyping, data extraction, or data submission process, communication through the Phenotype & Data Acquisition GitHub repository is the preferred method.

The Phenotype workstream hosts twice weekly Zoom office hours on Mondays and Thursdays from 12:00-1:00 pm ET. Registration details are available on the N3C calendar.

Leadership and Administration

Emily Pfaff, MSIS, PhD
University of North Carolina Chapel Hill