Note: Documentation for the tools specific to the N3C Data Enclave is only available to registered Enclave users. Please create an N3C Data Enclave account to learn more.
R & Python
The R and Python coding platforms are both fully supported in the N3C Data Enclave. The most popular packages for data manipulation, visualization, hypothesis testing, and predictive model development have been pre-installed. These include tidyverse, Pandas, SciKitLearn, and hundreds more. You can access them within Code Workbook, a specialized application available within the Enclave that allows users to analyze and transform data using a graphical interface. Code Workbook additionally improves the discoverability of workflow and allows researchers to track the input and output of each of their analyses. Documentation for utilizing the packages is available within the Enclave here (please note, you must be a registered Enclave user to access this link).
Spark SQL is a module for working with and querying structured data within the Spark framework (used under the hood to ingest, transform, and process most Enclave Data). It is an efficient tool for filtering, joining, and aggregating large datasets, which can be done in the Enclave natively with Spark SQL, or with R or Python using the SparkR or PySpark packages. Documentation can be found within the Enclave here (please note, you must be a registered Enclave user to access this link).
The N3C environment features an easy to use point and click data analysis tool called Contour for manipulating and graphing datasets. With it you can quickly access datasets, conduct common analytical and logical operations in sequence to explore your data, debug data quality, cleanse and transform your data, and create visualizations and reports to share your findings with others. Documentation (please note, you must be a registered Enclave user to access this link) for Contour is only available within the Enclave, as it is a core tool within the Palantir framework.