N3C Data Enclave Tools

Enclave Tools

The N3C Data Enclave includes an expansive suite of tools geared to help you better discover, explore, and analyze N3C clinical data. Researchers can utilize familiar software, such as R and Python, to gain insights within the Enclave, and can also take advantage of core Enclave tools built specifically to enable analysis within the environment. The software tools suited for the N3C Data Enclave were selected for their popularity and ease of use. If the packages or tools you need aren’t already available, you can request that they be added through the N3C Support Desk.

Popular Tools

Note: Documentation for the tools specific to the N3C Data Enclave is only available to registered Enclave users. Please create an N3C Data Enclave account to learn more.

R & Python

The R and Python coding platforms are both fully supported in the N3C Data Enclave. The most popular packages for data manipulation, visualization, hypothesis testing, and predictive model development have been pre-installed. These include tidyverse, Pandas, SciKitLearn, and hundreds more. You can access them within Code Workbook, a specialized application available within the Enclave that allows users to analyze and transform data using a graphical interface. Code Workbook additionally improves the discoverability of workflow and allows researchers to track the input and output of each of their analyses. Documentation for utilizing the packages is available within the Enclave here (please note, you must be a registered Enclave user to access this link).

Apache Spark

Spark SQL is a module for working with and querying structured data within the Spark framework (used under the hood to ingest, transform, and process most Enclave Data). It is an efficient tool for filtering, joining, and aggregating large datasets, which can be done in the Enclave natively with Spark SQL, or with R or Python using the SparkR or PySpark packages. Documentation can be found within the Enclave here (please note, you must be a registered Enclave user to access this link).

Contour

The N3C environment features an easy to use point and click data analysis tool called Contour for manipulating and graphing datasets. With it you can quickly access datasets, conduct common analytical and logical operations in sequence to explore your data, debug data quality, cleanse and transform your data, and create visualizations and reports to share your findings with others. Documentation (please note, you must be a registered Enclave user to access this link) for Contour is only available within the Enclave, as it is a core tool within the Palantir framework.