N3C Data Enclave
To register with N3C, go to the N3C Login/Registration Page and select the button that says “Click here to Register with N3C”. You must use an institutional email address linked to an individual (not a shared account) to register with N3C. If you are affiliated with an institution that is not part of the InCommon authentication framework, or are a citizen scientist and do not have an institutional email address, click on the “Login.gov” button and follow the instructions. [Note that citizen scientists must use the same email address to register for an account as the email they use to establish a Data Use Agreement (DUA).]
Registering with N3C is the first step to establish an N3C Data Enclave account and necessary to link users to a Data Use Agreement. In addition, it allows you to become a member of the N3C community and gain access to the team document drive, Google groups, Slack channels, calendar meeting invitations, and the workstreams. (It is not mandatory to join a workstream to be a member of the N3C community.) Other incentives to becoming an N3C community member include collaborations, community input, joint publications, and governance. Requests for N3C membership and joining of Google groups are reviewed and granted rapidly during the week, but can take longer on weekends (up to 48 hours).
Anyone can apply to be the Lead Investigator on an N3C project. However, the person listed on the account as the Lead Investigator must agree to uphold the following responsibilities:
- Serve as the N3C contact person for questions, concerns, and requests to join the project.
- Serve as the individual responsible for assuring completion of the project. Projects are renewed annually, per the Data Use Request (DUR).
- Identify and/or approve requests for collaborators to join the approved project-specific DUR. Collaborators from other institutions must still submit a DUR and follow their own institutions’ policies for human subjects research, as applicable.
Before creating your account, you will need to have the following available:
- The N3C Data Enclave is hosted by NIH, so all users are required to have annual NIH Information Security and Information Management Training and should be prepared to provide a certificate of completion upon request. This training is freely available to the public and a certificate of completion can be printed provided that the course is taken in one setting. Researchers must complete training before they can submit a Data Use Request.
- Only the 2020 Information Security course at the top of the list is required to be completed consisting of these segments: Information Security, Counterintelligence & Insider Threat, Privacy, Records Management, Emergency Preparedness. (Allow 60-90 minutes to complete.)
- Upon completion, you can click on “Print Certificate”, add your name, and save to PDF or print to make it available upon request.
- You will be asked for the training completion date, and may be asked by NCATS to provide evidence of completion (i.e., a screenshot or copy of the Certificate of Completion).
- If you are requesting access to the de-identified dataset or the limited dataset, you must also have completed Human Subjects Research (HSR) protection training per your home institutional requirements. (CITI Program and NIH courses are most common.) You will be required to enter your HSR training completion date. (HSR training is not needed to access the synthetic dataset.)
The goal is to update data from contributing sites on average intervals of two times per week. This includes new patient records as well as updates to existing ones. When data are refreshed, information about the data will be provided via Fact Sheets and User Notes.
- Fact Sheets provide a description of the data, such as number of sites available, persons, and COVID cases, and will be available on both the N3C and NCATS websites.
- Release Notes describe any known issues about the data and missing data, interpretation of qualitative results, and implemented crosswalks. Release Notes will only be available to users who are logged into the N3C Data Enclave.
Yes; however, this will be done on a case-by-case basis to ensure that data are secure and all legal and regulatory requirements have been met. Governance processes for data uploads are in development. For more information, contact the N3C Data Enclave technical support team at NCATSN3CSupport@mail.nih.gov.
The N3C Enclave is capable of integrating external tools, such as the OHDSI ATLAS platform. However, the tools must conform to the security requirements as specified by the Data Transfer Agreement and Data Use Agreement, and they may require special configuration and auditing. Extra controls are required to install packages and tools, since the enclave is a FedRamp certified environment. For further questions on software and tools in the enclave, contact the N3C Data Enclave technical support team at NCATSN3CSupport@mail.nih.gov.
Data analysis within the N3C Data Enclave is supported by R and Python, the most widely used open source platforms for statistical analysis and data science. It is also supported by Slate, the Palantir Foundry programming language.
- Commonly used R and Python analysis packages, such as SciPy and scikit-learn, have been pre-installed to use in your pipelines. If additional R or Python packages are needed, contact the N3C Data Enclave technical support team at NCATSN3CSupport@mail.nih.gov.
- Data and code provenance is fully tracked throughout the course of the project by use of the platform’s native functions, which facilitate tracing of all data and code development. Training materials for the platform’s native tooling and for use of R and Python code and SQL within the environment are readily available inside the N3C Data Enclave.
- Other data analysis tools are currently being evaluated for use in N3C, including the OHDSI tool stack, Leaf, and the NIH Biomedical Translator Knowledge Graph Engine.
Learn more about supported tools and resources and suggest new ones at the Tools and Resources subgroup meetings, which usually is held on Fridays 1:00pm PT/4:00pm ET.
There are recommended training materials and resources available within the N3C Data Enclave. There is an N3C Quickstart Tour and an Intro to Foundry training course (offered twice a month) to understand core concepts for using the N3C Data Enclave software. Additional resources include live scheduled trainings, written tutorials, interactive tours, official documentation, and a help & support system.
Publicly accessible information about the OMOP Common Data Model (CDM) and OHDSI tools for running analyses against OMOP include the Book of OHDSI and OHDSI tutorial videos and there is free courseware available through EHDEN Academy.The two courses of interest to N3C users would be those covering the CDM and vocabularies and ATLAS.
Contributing site names are masked in the N3C data; however, data can be differentiated, such as by Site A, Site B, etc.
NCATS and the N3C community encourage citation of the N3C Data Enclave. Information on how to cite and acknowledge the N3C Data Enclave can be found in Acknowledging N3C in Publications and Presentations on the NCATS website, as well as in the Attribution & Publication Principles for N3C in the Zenodo database.
All N3C publications & presentations must include relevant grant attribution.
For publications involving use of data from the N3C Data Enclave, the following acknowledgment statement must be included:
“The analyses described in this [publication/report/presentation] were conducted with data or tools accessed through the NCATS N3C Data Enclave https://covid.cd2h.org and N3C Attribution & Publication Policy v 1.2-2020-08-25b supported by NCATS U24 TR002306 and [insert additional funding agencies or sources and reference numbers.] This research was possible because of the patients whose information is included within the data and the organizations and scientists who have contributed to the on-going development of this community resource https://doi.org/10.1093/jamia/ocaa196/5893482.”
For publications involving an academic institution that resides at a Clinical and Translational Research (CTR) site funded by the Institutional Development Award (IDeA) Program grant, the following acknowledgment statement must be included:
“The project described was supported by the National Institute of General Medical Sciences, 5U54GM104942-04. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.”
For N3C Data Enclave support of all types, you can access the N3C Support Desk on the N3C website to view options. You can also attend any of the 3 N3C office hour sessions to talk to live experts in these specific areas:
- N3C Data Enclave - every Tuesday 1-2 pm ET
- N3C Phenotype & Harmonization - Thursdays 12-1 pm ET
- Synthetic Dataset - 1st & 3rd Tuesday of the month 2:30-3:30 pm ET
On the Enclave Projects webpage, a live feed of approved projects can be viewed with basic information, such as project title, brief abstract, lead investigator name, and affiliated institution.
The Project Dashboard located inside the N3C Data Enclave can be viewed by anyone with an Enclave account. It provides a more detailed list of approved projects and the ability to request collaboration:
- Project title
- Research project abstract
- Lead Investigator name
- Lead Investigator accessing institution
- Request to join a project as a collaborator
Researchers can use the dashboard to see projects that are using N3C data and can request to join projects open to collaborators by clicking on the “Contact Lead Investigator” button.
The N3C Data Enclave is hosted by NIH; therefore, NIH IT security training is required for users who will be accessing the data. The training is not required for individuals who will not be accessing the data.
Many organizations already require 2-factor authentication (2FA) for secure login. This is a separate security step from the UNA 2FA, so you may be required to perform two verifications. You can mitigate this inconvenience by using the “remember this computer” checkbox on a trusted device. If you already use a 2FA app (e.g., Duo Mobile), you will likely be able to use it by creating a separate account for UNA following the process outlined in this document.
Users of N3C create research artifacts as “reports.” These reports reference the set of tools, methods, and data to reproduce the result. N3C users are required to share these reports when they are of sufficient quality or completeness. The goal is to encourage collaboration or working on corroborating evidence related to the same research topic. The Attribution and Publication policy provides additional details about sharing broadly outside the Enclave.
As described on the NCATS N3C website, all individuals must be covered by a Data Use Agreement (DUA) and have taken the requisite training (security and human subjects) prior to submitting a Data Use Request (DUR). Therefore, timing will depend on whether training and contractual steps have already been completed. Please visit the Applying for Data Access page on the NCATS website for information about obtaining access to the data. You can also consult the Registration Checklist and the Access the Data Enclave webpages on the N3C website for more information on what is required to gain access. You will need to confirm that a Data Use Agreement between your institution and NCATS has been signed and is on file. To check the status of this, you can view the list of DUA Signatories or contact your institutional official. Once an executed DUA is confirmed to be on file, you must then register with N3C and create a profile.
Note: Simply having an account does not automatically grant access to the N3C data. Data access is provided through submission of a Data Use Request (DUR). The Data Access Committee (DAC) reviews DURs on a weekly basis, and project workspaces are set up approximately 3 business days after DUR approval.
A user can become a collaborator on a project in one of two ways:
(1) The Lead Investigator can submit a Data Use Request (DUR) and list the user as a collaborator. In this case, the identified collaborator will be sent an email invitation to submit a DUR to join a project.
(2) A user can identify an existing project of interest and request to join it. The user will need to log into their account and click on “DUR for Collaborator” to access the online form, enter their required information, and meet a number of data access requirements. When the “DUR for Collaborator” form has been submitted, the Lead Investigator will receive a system-generated email indicating that the user has asked to join.
Domain Team FAQs
N3C will provide all Domain Teams with:
- A dedicated webpage on the N3C website that will include the Domain Team’s mission, leadership headshots, and affiliated institutions, as well as information about how to join.
- A Google Drive workspace under the N3C Drive general Domain Team folder to store research documents.
- A Google Group Mailing List to allow members to communicate to the entire team.
- A dedicated Slack Channel within the CD2H workspace.
- A Data Liaison if the team does not have a member suited for this designation.
- A listing on the N3C website of the Domain Team's projects and publications. (Not available yet.)
View the Domain Team Creation web page for more information.
Administrative functions are expected (when applicable) as follows:
- All Domain Team meetings should be public and announced.
- The N3C email address firstname.lastname@example.org and the associated Domain Team Google Group email address should be added as invitees to all meetings.
- Domain Teams must organize and provide their own virtual meeting rooms (eg, Zoom, Webex, etc.).
- Meetings should use measures to prevent unwanted guests.
- To prevent exposure of sensitive data, hackathons or Enclave sessions should not be recorded.
- Domain Teams should designate a meeting host.
- Email the meeting agenda to the Domain Team's Google Group address prior to the meeting (eg, the day before).
- Have a dedicated notetaker record meeting minutes on a rolling agenda/notes document
- Create a header at the top of each notes document to include:
- Leaders(s) name(s) and email address(es)
- Recurring meeting day & time with call-in or meeting link
- Organize regular report-outs at the Clinical Scenarios and Collaborative Analytics meetings.
View the Domain Team Creation web page for more information.
In addition to the Domain Team leads, the N3C recommends several additional designations for Domain Team members to facilitate more efficient analysis within the Enclave. It is recommended that each team member have a clear role with responsibilities. Members are expected to contribute in some fashion and be willing to have tasks assigned as needed.
- Subject Matter Expert: The SME will be instrumental in leading the direction of research and ensuring that it contributes to the specific research field, providing background material/outlines for manuscripts and presentations, helping to answer any clinical questions that arise, and defining concept sets and cohorts using ATLAS or the functionalities in the Enclave.
- Informatics Lead: The Informatics Lead will coordinate with other data analysts/informaticians, work with the Data Analyst/Logic Liaison and Subject Matter Expert (SME) to define derived cohorts using functionalities in the Enclave. This lead will assist other informaticists on the Domain Team with implementing analyses and providing results to the SME in an interpretable format, write up methods and technical cohort descriptions for manuscripts and presentations, support informatics team members, and stay informed of N3C level data-related policies and procedures.
- Data Analyst/Logic Liaison: The Data Analyst or N3C-supported Logic Liaison should familiarize themselves with the tools/features/workflows within the Enclave, enable quality control and validation of relevant data elements/cohorts, implement analytical workflows using appropriate methods, ensure reproducibility of all work, and format and submit data to NCATS for approval according to the N3C Data Download Policy. Specific skills and best practices to assist this role include:
- Use of Code Workbooks, Core Repos, or Contour to perform analytical tasks within the Enclave
- Import and export code templates to the Knowledge Store within the Enclave
- Construction of appropriate branches, build schedules, and triggers to enable reproducibility of constructed analyses
- Prepare for approval of exported tables and figures through use of reports in the Enclave
- Data Liaison: The Data Liaison may provide support such as, elicit concept set and data variable definitions needed for domain team projects, facilitate identification or building of concept sets and variables matching the Domain Team research, support analysts working with data/concept sets/variables through representation of data-focused attributes & issues which may impact analyses, assist team members in the reporting of data issues within the Enclave, assist with Enclave training involving data sources and concept sets and transfer such skills and knowledge to other Domain Team members.
- A Data Liaison should have an understanding of primary data sources and attributes impacting data (EMRs, registries, external data sources), prior use/knowledge of coding systems, and controlled vocabularies used in primary data sources such as LOINC, ICD, CPT4, etc., familiarity or experience with implementation(s) of one or more data partner source common data models (PCORnet, ACT/i2b2, TriNetX, OMOP), experience/familiarity with the OMOP common data model and vocabulary, knowledge of OHDSI Atlas tooling, and experience working with clinicians and clinical researchers to elicit and implement clinical concepts with available data/coding systems.
- With prior approval, an existing liaison can be assigned to a team if needed
View the Domain Team Creation web page for more information.
Domain Team leaders will be instrumental in facilitating project success (defined by answering COVID-19 research questions, producing publications, and/or implementation in healthcare). Leaders will help coordinate activities, provide status updates, lead meetings, and ensure all members are aware of Attribution, Data Download, and other N3C policies. More specific responsibilities are as follows:
- Primary contact for any questions related to team activity both from the public and N3C administration.
- Ensure that all new members have registered with the N3C
- Submit their Domain Team's Data Use Request (DUR) and monitor Join Requests within the Enclave.
- Coordinate and facilitate Domain Team meetings to occur at least once a month.
- Utilize N3C resources for communication, document storage, and other research activities.
View the Domain Team Creation web page for more information.
Phenotype & Data Acquisition
It is preferable and highly encouraged for sites to use the N3C COVID-19 phenotype to identify their population and ensure a consistent cohort definition across the N3C dataset. Due to the importance of consistency, the Phenotype Workstream has developed “plug and play” code to define the phenotype. However, if a site chooses to use a locally defined phenotype for COVID-19, the site will need to provide its cohort definition in a machine-readable format and provide updates of the custom cohort definition if and when it changes. Likewise, any issues in format, either syntactic or semantic, incurred during the ingestion process will be the site’s responsibility to resolve. Sites that plan to use a local COVID-19 phenotype should notify the Phenotype Workstream via a GitHub issue or during Harmonization/Phenotype Office Hours for assistance with the process.
No; the N3C can accept data in their native common data model (CDM) format, whether it be OMOP, PCORnet, ACT, or TriNetX. Code is available in the Phenotype GitHub repository and documentation is available in the Phenotype GitHub Wiki. Staff from sites using OMOP, PCORnet, or ACT data models will run N3C’s code to define their cohort and extract data elements from their datamart. Sites using TriNetX will work with their TriNetX representative to support the data extraction.
Data ingestion and harmonization will be an iterative process between the Phenotype & Data Acquisition workstream, the Data Ingestion & Harmonization workstream, and contributing sites. First and foremost, the best way to support the data ingestion and harmonization process is to ensure the data payload follows the specifications outlined in the GitHub Wiki documentation. Use of the R or Python Exporter will support consistency while lowering site burden.
The phenotype for COVID-19 continues to evolve as more is learned about the disease and new codes (particularly LOINC) become available. The phenotype is updated approximately every 2 weeks on Fridays, with corresponding code updated by the following Tuesday. The latest version of the phenotype can be found in the Phenotype Github repository.
Data will be transferred to NCATS via SFTP. To obtain SFTP credentials after IRB approval has been granted and the DTA has been signed, please contact Emily Pfaff at epfaff [at] email [dot] unc [dot] edu.
The Wiki documentation outlines the expected data output format, naming conventions, and directory structure. If a site is using the R or Python exporter, all of these steps will be included in those exporter packages. If a site is using the raw SQL, the Wiki should be reviewed for specifications and examples for packaging the data (OMOP, PCORnet, ACT).
Ideally, sites will transmit data to N3C 1-2 times per week. If this refresh rate is not feasible for a site, the Phenotype workstream will provide assistance.
If a site uses its common data model for other purposes (particularly network studies) and follows the guidelines for populating it, it will most likely have the data it needs for the initial phases of N3C. As time goes on, the N3C consortium may agree on additional data elements to add that are not generally found in CDMs, or are infrequently populated.
All sites are strongly encouraged to use the R and Python Exporters that the Phenotype & Data Acquisition workstream has created. This will ensure uniform data exports, which is critical to the data harmonization process. Moreover, the R and Python scripts automate many tasks that would otherwise be manual.
Alternatively, sites may choose to use raw SQL scripts. If a site chooses this option, they should closely review and follow the export specifications outlined in the documentation (OMOP, PCORnet, ACT).
If you are working with TriNetX, TriNetX will export your site’s data on your site’s behalf; you will not need to use any of the code (exporters or phenotype).
The Phenotype workstream can provide assistance. If support is needed during any part of the phenotyping, data extraction, or data submission process, communication through the Phenotype & Data Acquisition GitHub repository is the preferred method.
The Phenotype workstream hosts twice weekly Zoom office hours on Mondays and Thursdays from 12:00-1:00 pm ET. Registration details are available on the N3C calendar.
Data Ingestion & Harmonization
Existing OHDSI OMOP data conformance tools and processes are all available in the N3C Data Enclave and are run on the data prior to ingestion into the full dataset. The data quality checks are a hybrid across all data models.
Common data models used by sites are consolidated and harmonized to the OMOP common data model and then ingested into the N3C Data Enclave as a new dataset.
Data in the N3C Data Enclave are secured to the highest standards as mandated by applicable NIH policy. All data are encrypted both in transit and at rest, without exception. In transit, data traffic is encrypted via SSL/TLS during both client-to-server and server-to-server communication. The software implements permissions to access or edit data as a granular dataset or row-level depending on the level of data access agreed upon by the Lead Investigator and the Data Access Committee (DAC). These permissions flow through the Enclave as derived data, analytics, or reports from the underlying secure fields to ensure all data are secure at all stages of storage and processing. Please see the NCATS N3C webpage for more detailed information.
Yes; the N3C Data Enclave supports comprehensive auditing of all significant data processing and access events. It captures metadata about the source of the data. It also maintains records of data imports, reads, writes, exports, and deletions along with the user, time, date, and action. This metadata can be used to track revision history and manage compliance with data auditing and oversight requirements. As standard practice, audited items include:
- Import and export of data
- User access (‘reading’) of data
- User edits of data
- Change to dataset permissions
- Granting or revoking of user roles or privileges
- User logging into the Enclave
- Failed login attempts
The N3C platform produces synthetic data from the limited dataset (LDS) that a site submits. Comparisons between source limited data and ensuing synthetic data are an essential component of the data quality assurance, verification, and validation processes used by N3C. Therefore, sites are required to submit an LDS to N3C in order to create a synthetic dataset. (See the NCATS webpage for more details on the levels of data access.)