Skip to Content
Contact DCTD
Show menu
Search this site
Last Updated: 04/25/2017

Data Science Bowl Launched to Improve Lung Cancer Screening

The third annual Data Science Bowl External Link opened on January 12, 2017 with the goal of improving the detection accuracy of low-dose computed tomography (LDCT) lung cancer screening using data curated by NCI's Cancer Imaging Program (CIP), Division of Cancer Treatment and Diagnosis (DCTD) and the Division of Cancer Prevention (DCP). Emphasis throughout the competition is being placed on solutions that meet the needs of real world applications. This year's competition encourages data scientists to develop machine learning algorithms to more accurately diagnose the presence of lung cancer at lower false positive rates than are currently encountered. The Data Science Bowl is being presented by Booz Allen Hamilton and Kaggle and is based on a project designed by CIP. NCI staff in DCTD and DCP collaborated with Booz Allen and Kaggle by building alliances, providing guidance on the scientific design of the competition, and facilitating data and image curation.

The Data Science Bowl naturally follows the National Lung Screening Trial (NLST), which was sponsored by NCI and launched in 2002. The NLST results demonstrated that LDCT screening reduced lung cancer mortality rates by 15-20% compared to standard chest X-ray; however, LDCT has historically resulted in high false positive rates (around 25%) that increase patient anxiety, promote unnecessary diagnostic follow-up testing and associated costs, and prevent its wider utility. An NCI workshop sponsored by CIP in 2012 explored ways to approach reducing the false positive rates of LDCT lung cancer screening, and an algorithm challenge was recommended. The goal of the Data Science Bowl is to develop diagnostic algorithms that can reduce false positive rates, which would ultimately lead to much more effective use of LDCT for lung cancer screening to the benefit of those eligible for screening.

Data Science Bowl competitors will have access to training and test data sets derived from various sources. The competition will run from January 12, 2017 to April 12, 2017, and the winners will be announced shortly thereafter. The competition will award winners with $1 million in prizes, the funds for which will be provided by the Laura and John Arnold Foundation.