N3C Frequently Asked Questions

The NCATS N3C resources webpage provides high-level regulatory and governance Frequently Asked Questions (FAQs), the Data Transfer Agreement (DTA), and other important information to get started with N3C. Below are frequently asked questions specifically related to the access and flow of data in the N3C Enclave, as addressed by their individual workstreams.


Phenotype & Data Acquisition

What does a site need to do to support data ingestion and harmonization of its data?

The data ingestion and harmonization process will be an iterative process between the Phenotype & Data Acquisition workstream, the Data Ingestion & Harmonization workstream, and contributing sites. First and foremost, the best way to support the data ingestion and harmonization process is to ensure the data payload follows the specifications outlined in the GitHub Wiki documentation. Use of the R or Python Exporter will support consistency while lowering site burden.

How will I know about changes to the code?

When the code changes in a way that will affect sites, an email will be sent to the n3c-tech-team listserv. Click here to apply for membership to this Google group.

How often do you update the phenotype?

The phenotype for COVID-19 continues to evolve as more is learned about the disease and new codes (particularly LOINC codes) become available. The phenotype is updated approximately every 2 weeks on Fridays, and the corresponding code is updated by the following Tuesday. The latest version of the phenotype can be found in the Phenotype Github repository.

How do I transfer the data? How do I get sFTP credentials?

Data will be transferred to NCATS via sFTP. To obtain sFTP credentials after IRB approval has been granted and the DTA has been signed, please contact Emily Pfaff at epfaff [at] email [dot] unc [dot] edu.

How do I package the data?

The Wiki documentation outlines the expected data output format, naming conventions, and directory structure. If a site is using the R or Python exporter, all of these steps will be included in those exporter packages. If a site is using the raw SQL, the Wiki should be reviewed for specifications and examples for packaging the data (OMOP, PCORnet, ACT).

What is the expected frequency of data delivery from sites to N3C?

Ideally, sites will transmit data to N3C 1-2 times per week. If this refresh rate is not feasible for a site, the Phenotype workstream will provide assistance.

What is the minimum set of data I need to have?

If a site uses its common data model for other purposes (particularly network studies) and follows the guidelines for populating it, it will most likely have the data it needs for the initial phases of N3C. As time goes on, the N3C consortium may agree on additional data elements to add that are not generally found in CDMs, or are infrequently populated.

Is it mandatory to use the R or Python exporters?

All sites are strongly encouraged to use the R and Python Exporters that the Phenotype & Data Acquisition workstream has created. This will ensure uniform data exports, which is critical to the data harmonization process. Moreover, the R and Python scripts automate many tasks that would otherwise be manual.

Alternatively, sites may choose to use the raw SQL scripts. If a site chooses this option, they should closely review and follow the export specifications outlined in the documentation (OMOP, PCORnet, ACT).

If you are working with TriNetX, TriNetX will export your site’s data on your site’s behalf; you will not need to use any of the code (exporters or phenotype).

How can I ask questions and get support?

The Phenotype workstream can provide assistance. If support is needed during any part of the phenotyping, data extraction, or data submission process, communication through the Phenotype & Data Acquisition GitHub repository is the preferred method for communication.

The Phenotype workstream hosts twice weekly Zoom office hours on Mondays and Thursdays from 12:00-1:00 pm ET. Details for registering for those calls are available on the N3C calendar.

Data Ingestion & Harmonization

What type of quality assurance/quality control processes are performed on the data?

Existing OHDSI OMOP data conformance tools and processes are all available in the N3C enclave and are run on the data prior to ingestion into the full dataset. The data quality checks are a hybrid across all data models.

How are the various common data models used by sites brought into the N3C enclave?

Common data models used by sites are consolidated and harmonized to the OMOP common data model and then ingested into the NCATS N3C Enclave as a new dataset.

Collaborative Analytics

What training is required before accessing the data? What expertise should users have?

Users are expected, but not required, to complete the Palantir Fundamentals training course to understand the core concepts of Palantir Foundry. (See the Collaborative Analytics training slide.) Users can gain access to the N3C Analytical Platform by first onboarding to the Collaborative Analytics workstream using their institutional email. Once an account has been created for them, they will be given access to the environment; however, this does not give access to the N3C datasets. This initially gives access to the Medicare Claims Synthetic Public Use File (SynPUF) and some OMOP data to become acquainted with the platform.

Users can either sign up for a hands-on training session (registration links to be provided on the N3C website) or undergo self-training using the provided video and written tutorial to learn more about navigating and using the platform. Streaming access to OHDSI videos provides an introduction to the OMOP common data model. (Registration required via the OHDSI website.)

Synthetic Data

Can sites contribute synthetic data to N3C instead of patient data as a Limited Data Set (LDS)?

No. Synthetic data are generated from the limited dataset that is sent to N3C (see NCATS FAQs.)