Skip to main content
An official website of the United States government
Email

Introducing: Proteomic Data Commons 2.0

The National Cancer Institute's Proteomic Data Commons (PDC) has launched “PDC 2.0” — a major update to the PDC portal experience. This release includes a redesigned user interface, a comprehensive documentation portal, and a new common data analysis pipeline for data-independent acquisition (DIA) mass spectrometry. All updates are now publicly available at: https://pdc.cancer.gov

Redesigned User Interface

The PDC homepage has been modernized and restructured to improve navigation for both new and existing users. Highlights include:

  • Modernized homepage and mega menu: A cleaner, more intuitive design with a prominent search bar, organized multi-column navigation, and easier access to key resources for faster, more efficient data exploration.
  • Streamlined filter system: Filters are consolidated into three main categories for ease-of-use, accompanied by an expanded set of clinical meta-data including demographics, diagnosis, treatment, exposure, and follow-up data.
  • Enhanced Studies tab: The study table on the Explore page is now simplified and each study now features expandable row containing detailed study information and file counts per data category.
  • Integrated multi-omics visibility: The PDC now provides clear indicators for both internal and external multi-omic data connections. Within the PDC, users can quickly see whether metabolomics or lipidomics datasets are available for the same study or cohort. Across the wider CRDC ecosystem, badges also highlight when related genomic or imaging data exist, with contextual navigation links to explore those resources.

Documentation Portal

PDC 2.0 introduces a dedicated documentation and guidance portal (accessible from the Documentation feature card on the PDC home page) to support researchers at every level of expertise. The following is a summary of what is currently available at the documentation portal.

User GuidesCore portal features, Clinical data and cohort exploration, protein expression heatmap visualization, PDC Data Download Client, multi-omic data discovery, and cloud-based data analysis workflows.
TutorialsCreating cohorts for download, exploring clinical attributes, using manifest files, protein-level analysis across multiple studies, gene-level aggregation, filter behavior and zero count interpretation, automated data access through the API, and more.
PDC Data ProcessesPDC data harmonization procedures, data types, and file formats.
Data SubmissionSubmission policies, request procedures, submission templates, tutorial, and workspace features overview.
API & Developer ResourcesFull API reference (endpoint descriptions, query parameters, response formats, and authentication procedures), sample code, schema documents, and best-practice guidance for automated data retrieval and integration into external workflows.
Release NotesVersion history for PDC data and software.

Common Data Analysis Pipeline for DIA Data

With PDC 2.0 also comes the launch of a new Common Data Analysis Pipeline (CDAP) for DIA mass spectrometry data, developed in collaboration with the MacCoss Lab at the University of Washington.

The pipeline supports both Thermo DIA and Bruker diaPASEF workflows and integrates established tools such as Carafe, EncyclopeDIA, DIA-NN, and Skyline to provide robust peptide-centric searches and unified quantification.

The DIA CDAP processes raw mass spectrometry data in four major stages:

  1. Standardization: Conversion of spectral files to a standard format.
  2. Peptide-centric searches: Perform individual file searches with EncyclopeDIA (with Carafe-generated spectral libraries) for Thermo DIA data, and DIA-NN for Bruker diaPASEF data.
  3. Search result consolidation: Results from individual file searches are combined and statistically validated to generate FDR controlled chromatogram library.
  4. Quantification and Visualization: Imports the chromatogram library into Skyline to define peak boundaries and transitions for quantification and visualization of results.

The Future of the PDC

PDC 2.0 represents a significant improvement to the platform’s usability, analytical capabilities, and multi-omic integration. Dr. Xu Zhang, lead for the PDC project commented, “These updates are a step forward for researchers working with large-scale proteomic datasets. PDC 2.0 lowers barriers for users, making it easier to integrate proteomic data into broader multi-omic cancer research.”

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “Introducing: Proteomic Data Commons 2.0 was originally published by the National Cancer Institute.”

Email