Synthetic Data Workstream


Use a variety of metrics to compare and validate the quality of generated synthetic data. Derive synthetic data from the harmonized electronic health record (EHR) data. Access to synthetic data requires creation of an N3C Data Enclave account, a signed Data Use Agreement (DUA), and submission of a data use request (DUR). The dataset is open to the broad research community, including domestic and international investigators and citizen scientists.

View workstream details in the Github Repository.


Office hours held on the 1st Tuesday of the month at 2:30 pm PT/5:30 pm ET.

Meeting held on the 2nd Tuesday of the month at 2:30 pm PT/5:30 pm ET.

Register for Meeting


Connect with Us:

  1. Onboard to N3C using the link below.
    • In there you will provide your email address. We will add that email address to the CD2H workspace.
  2. Go to our workstream Slack channel directly using the link provided below.
    • Login with your Slack credentials.
Synthetic Data Icon


The N3C platform produces synthetic data from the limited dataset (LDS) that a site submits. Comparisons between source limited data and ensuing synthetic data are an essential component of the data quality assurance, verification, and validation processes used by N3C. Therefore, sites are required to submit an LDS to N3C in order to create a synthetic dataset. (See the NCATS webpage for more details on the levels of data access.)

Leadership and Administration

Philip Payne headshot

Washington University in St. Louis

Workstream Lead
Adam Wilcox Headshot

Washington University in St. Louis

Nicole Venteris

Washington University in St. Loius

Project Manager
Randi Foraker Headshot

Washington University

Synthetic Data Task Teams

Group Name

Mailing List

Mailing List address

Drive/Notes Link

Synthetic Data Privacy



Synthetic Data Validation



Synthetic Platform Architecture