Introducing: Proteomic Data Commons 2.0

Posted: November 20, 2025

The National Cancer Institute's Proteomic Data Commons (PDC) has launched “PDC 2.0” — a major update to the PDC portal experience. This release includes a redesigned user interface, a comprehensive documentation portal, and a new common data analysis pipeline for data-independent acquisition (DIA) mass spectrometry. All updates are now publicly available at: https://pdc.cancer.gov

Redesigned User Interface

The PDC homepage has been modernized and restructured to improve navigation for both new and existing users. Highlights include:

Modernized homepage and mega menu: A cleaner, more intuitive design with a prominent search bar, organized multi-column navigation, and easier access to key resources for faster, more efficient data exploration.
Streamlined filter system: Filters are consolidated into three main categories for ease-of-use, accompanied by an expanded set of clinical meta-data including demographics, diagnosis, treatment, exposure, and follow-up data.
Enhanced Studies tab: The study table on the Explore page is now simplified and each study now features expandable row containing detailed study information and file counts per data category.
Integrated multi-omics visibility: The PDC now provides clear indicators for both internal and external multi-omic data connections. Within the PDC, users can quickly see whether metabolomics or lipidomics datasets are available for the same study or cohort. Across the wider CRDC ecosystem, badges also highlight when related genomic or imaging data exist, with contextual navigation links to explore those resources.

Documentation Portal

PDC 2.0 introduces a dedicated documentation and guidance portal (accessible from the Documentation feature card on the PDC home page) to support researchers at every level of expertise. The following is a summary of what is currently available at the documentation portal.

User Guides	Core portal features, Clinical data and cohort exploration, protein expression heatmap visualization, PDC Data Download Client, multi-omic data discovery, and cloud-based data analysis workflows.
Tutorials	Creating cohorts for download, exploring clinical attributes, using manifest files, protein-level analysis across multiple studies, gene-level aggregation, filter behavior and zero count interpretation, automated data access through the API, and more.
PDC Data Processes	PDC data harmonization procedures, data types, and file formats.
Data Submission	Submission policies, request procedures, submission templates, tutorial, and workspace features overview.
API & Developer Resources	Full API reference (endpoint descriptions, query parameters, response formats, and authentication procedures), sample code, schema documents, and best-practice guidance for automated data retrieval and integration into external workflows.
Release Notes	Version history for PDC data and software.

Common Data Analysis Pipeline for DIA Data

With PDC 2.0 also comes the launch of a new Common Data Analysis Pipeline (CDAP) for DIA mass spectrometry data, developed in collaboration with the MacCoss Lab at the University of Washington.

The pipeline supports both Thermo DIA and Bruker diaPASEF workflows and integrates established tools such as Carafe, EncyclopeDIA, DIA-NN, and Skyline to provide robust peptide-centric searches and unified quantification.

The DIA CDAP processes raw mass spectrometry data in four major stages:

Standardization: Conversion of spectral files to a standard format.
Peptide-centric searches: Perform individual file searches with EncyclopeDIA (with Carafe-generated spectral libraries) for Thermo DIA data, and DIA-NN for Bruker diaPASEF data.
Search result consolidation: Results from individual file searches are combined and statistically validated to generate FDR controlled chromatogram library.
Quantification and Visualization: Imports the chromatogram library into Skyline to define peak boundaries and transitions for quantification and visualization of results.

The Future of the PDC

PDC 2.0 represents a significant improvement to the platform’s usability, analytical capabilities, and multi-omic integration. Dr. Xu Zhang, lead for the PDC project commented, “These updates are a step forward for researchers working with large-scale proteomic datasets. PDC 2.0 lowers barriers for users, making it easier to integrate proteomic data into broader multi-omic cancer research.”