Skip to main content
An official website of the United States government
Email

Aligning Tumor Mutational Burden (TMB) across diagnostic platforms

A Calibration Tool Developed in Conjunction with Phase 2 of the Friends of Cancer Research TMB Harmonization Project

BRP Software Development Team

2021-03-29

Introduction

This package generates and applies a calibration model as described in phase 2 of the Friends of Cancer Research (FOCR) tumor mutational burden (TMB) harmonization project [Vega DM 2021]. The package requires the user to supply a tab-delimited text file containing training data. These data comprise paired TMB values, one from a whole exome sequencing (WES) assay and one from a laboratory-specific panel diagnostic assay, for each of a series of samples. Using these training data, the software estimates a calibration curve with corresponding prediction limits to quantify the average relationship between WES and panel TMB assay values as well as variability around that curve. The user must also input additional TMB values from the same panel diagnostic assay; the package then provides estimates of TMB that are calibrated to WES TMB using the calibration curve and associated prediction limits calculated from the training data. Intervals of uncertainty are also provided to accompany the corresponding WES-calibrated TMB values. HTML output and plots similar to published results from phase 2 of the FOCR TMB harmonization project are provided to describe the calibration curve and report WES-calibrated laboratory-specific diagnostic panel assay values.

The calibration model is derived by fitting a weighted least squares linear regression model under assumptions of linear mean structure, Gaussian errors, and power variance structure. The software is designed for this situation based on results from phase 2 of the FOCR TMB Project that supported these assumptions. The source code is made available so that individual laboratories may modify it to explore different statistical approaches to developing calibration curves and prediction limits, which could include generalizations such as nonlinear regression and error structures other than Gaussian and power variance.

Installation

After downloading the package to a folder on your computer, you can install the tmbLab R package from the binary or source installer file. In the R console, click “Packages” on the R menu and select “Install package(s) from local files”. In the pop-up window, browse for the file “tmbLab_x.x.x.zip” or “tmbLab_x.x.x.tar.gz” and click “Open”. The installation process will show in the console. If there is no error message, the installation was successful.

Quick Start

Get help about the tmbLab R package

library(tmbLab)
help(package=tmbLab)

Identify the required data

Two data files are required to run the functions tmbLab() or tmbLabWES(). Additional information about the data files file.Model.Data.TMBfile.Obs.Panel.TMB, and file.WES.Panel.TMB can be found in the function descriptions for tmbLab() and tmbLabWES() below.

  1. file.Model.Data.TMB is required input for the tmbLab() or the tmbLabWES() function. This file describes the laboratory-specific panel TMB training data file name and must be a tab-delimited text file with 3 or more columns:

    Column 1: “Sample.ID”; the unique sample identifiers.

    Column 2: “Uniform.WES.TMB”; the training set WES values.

    Column 3 and more: “Panel.1”, “Panel.2,” etc. denoting one or more laboratory-specific panel TMB columns.
  2. file.Obs.Panel.TMB is required input for the tmbLab() function. This file describes the name of the data file that contains laboratory-specific panel TMB values at which the function will provide calibrated WES TMB estimates and intervals of uncertainty. The file must be a tab-delimited text file with 2 columns:

    Column 1: “Sample.ID”; the unique sample identifiers.

    Column 2: " Panel.TMB “; panel TMB values for which calibrated WES TMB estimates and intervals of uncertainty will be calculated. Note that”Panel.TMB" values may be real or hypothetical panel TMB values at which WES-calibrated estimates and intervals of uncertainty are desired.
  3. file.WES.Panel.TMB is required input for the tmbLabWES() function. This file describes the name of the data file that contains WES TMB values at which panel TMB predicted values along with prediction limits will be calculated. The file must be a tab-delimited text file with 2 columns:

    Column 1: “Sample.ID”; the unique sample identifiers.

    Column 2: " WES.TMB “; WES TMB values for which predicted panel TMB and prediction limits will be calculated. Note that”WES.TMB" values may be real or hypothetical WES TMB values at which predicted panel TMB and prediction limits are desired.

Run the function tmbLab()

 
library(tmbLab)
res <- tmbLab(file.Model.Data.TMB =
              file.path(path.package("tmbLab"), "extdata/Model.WES.Panel.TMB.txt"),
              file.Obs.Panel.TMB =
              file.path(path.package("tmbLab"), "extdata/NewSample.Panel.TMB.txt"))

names(res)
[1] "Obs.PANEL.TMB.vec" "Obs.PANEL.IDs.vec" "PANEL.vec"         "Calib.Interval"    "trunc.neg.flag"
[6] "dir.resPath"       "dir.output"        "predMethod"        "Lab.All"

res$Lab.All
 Sample.ID Panel Obs.Panel.TMB CALIB.Est.TMB CALIB.Lower.Lim.TMB CALIB.Upper.Lim.TMB Range.Indicator
1   NewS0001 Panel.1          2.41        1.4473              0.1519              5.5096        In
2   NewS0002 Panel.1         25.54       22.8244             17.3245             29.1249        In
3   NewS0021 Panel.1         52.80       48.0185              0.0000                  NA        Out
4   NewS0003 Panel.1          3.06        2.0481              0.2527              6.2459        In
5   NewS0004 Panel.1         26.67       23.8688             18.2891             30.2331        In
......
 

Run the function tmbLabWES()

 
library(tmbLab)
resWES <- tmbLabWES(file.Model.Data.TMB =
                    file.path(path.package("tmbLab"), "extdata/Model.WES.Panel.TMB.txt"),
                    file.WES.Panel.TMB =
                    file.path(path.package("tmbLab"), "extdata/NewSample.WES.TMB.txt"))

names(resWES)
[1] "Obs.PANEL.TMB.vec" "Obs.PANEL.IDs.vec" "PANEL.vec"         "Pred.Interval"     "trunc.neg.flag"
[6] "dir.resPath"       "dir.output"        "predMethod"        "Lab.All"

resWES$Lab.All
   Sample.ID   Panel WES.TMB WES.Est.TMB WES.Lower.Lim.TMB WES.Upper.Lim.TMB Range.Indicator
1   NewS0001 Panel.1       5      6.2561            1.9686           10.5435              In
2   NewS0002 Panel.1      10     11.6682            6.5262           16.8103              In
3   NewS0003 Panel.1      15     17.0804           11.3591           22.8017              In
4   NewS0004 Panel.1      17     19.2452           13.3314           25.1591              In
5   NewS0005 Panel.1      20     22.4925           16.3181           28.6670              In
......

Main functions

tmbLab()

Description

This function estimates, from user-supplied training data, a calibration curve with corresponding prediction limits to quantify the average relationship between WES and panel TMB assay results, as well as variability around that curve. The user must also input additional TMB values from the same panel diagnostic assay, and the package will provide estimates of TMB that are calibrated to WES TMB using the calibration curve and associated prediction limits calculated from the training data. Intervals of uncertainty are also provided to accompany the corresponding WES-calibrated TMB values.

Maximum likelihood methods implemented in the gls() function in the R package “nlme” are used to fit a weighted least squares linear regression model. This model assumes a linear mean structure, Gaussian errors, and power variance structure. Users are advised to check the reasonableness of these assumptions for their own data.

The function requires two input files. The first file, designated below as file.Model.Data.TMB, is a tab-delimited text file containing training data consisting of paired TMB values, one from a whole exome sequencing (WES) assay and one from a laboratory-specific panel diagnostic assay, for each of a series of samples. The second is a tab-delimited text file that contains laboratory-specific panel TMB values at which to provide WES-calibrated estimates. These laboratory-specific panel TMB values will be the values at which the function will invert the regression line and prediction limits to obtain WES-calibrated estimates and their corresponding intervals of uncertainty. Further details of the required formats for these files are given below under the “Arguments” section.

To better understand the calibration process, consider Figure 1. The first input file, which contains the training data, will correspond to the points depicted in the scatter plot with x-axis and y-axis corresponding to the WES TMB values Uniform.WES.TMB and laboratory-specific panel TMB values Panel.TMB, respectively. This first data input file includes all the information necessary to generate the regression line and prediction limits, denoted by the solid black line and the dotted black lines, respectively. The second input file provides a set of y0 values, which are real or hypothetical panel TMB values at which WES-calibrated estimates and intervals of uncertainty are desired. In Figure 1, a single y0 input value is depicted on the y-axis with a yellow horizontal line. The WES-calibrated estimate pertaining to y0 is derived by using the fitted calibration curve and is depicted on the x-axis as x0. In addition, an interval of uncertainty around the WES-calibrated value, (LL95(y0), UL95(y0)), is provided by projecting the prediction limits onto the WES-axis as shown.

Prior to fitting the regression model, this function drops observations (samples) from the training data supplied via the input file.Model.Data.TMB with WES TMB values that are greater than 40. These observations are dropped from the analysis because data collected as part of phase 2 of the FOCR TMB harmonization project [Vega DM 2021] were insufficient to assess modeling assumptions, particularly linearity, at these high TMB values. The source code is made available so that users may modify this cutoff, if desired.

Usage

tmbLab(
  file.Model.Data.TMB,
  file.Obs.Panel.TMB,
  Calib.Interval = 95,
  trunc.neg.flag = c("NEG", "TRUNCtoZERO"),
  dir.output,
  dir.result = "tmbLab_res",
  show.HTML = TRUE
)

Arguments

file.Model.Data.TMB Character string

This required file describes the laboratory-specific panel TMB training data file name and must be a tab-delimited text file with 3 or more columns:

Column 1: “Sample.ID”; the unique sample identifiers.

Column 2: “Uniform.WES.TMB”; the training set WES values.

Column 3 and more: “Panel.1”, “Panel.2,” etc. denoting one or more laboratory-specific panel TMB columns. It is anticipated that most users will provide only one set of laboratory-specific panel TMB data, corresponding to “Panel.1,” however, the option is given to include data from multiple laboratory-specific panels by specifying additional columns.

Note also that the model implemented in this package assumes that samples are independent. Inputting multiple replicate TMB measurements from the same sample will not lead to appropriate statistical inference (e.g., prediction limits will be incorrect).

An example of this required file is as follows:

Sample.ID   Uniform.WES.TMB   Panel.1   Panel.2
S0000001    1.3410            1.0132    0.9345
S0000002    1.6414            3.4837    2.0914
S0000003    0.7608            0.3165    0.2429
S0000004    0.8482            1.5107    0.4991
S0000005    0.8301            3.4755    1.8225
......

file.Obs.Panel.TMB Character string

This required file describes the name of the data file that contains laboratory-specific panel TMB values at which the function will provide calibrated WES TMB estimates and intervals of uncertainty. The file must be a tab-delimited text file with 2 columns:

Column 1: “Sample.ID”; the unique sample identifiers.

Column 2: " Panel.TMB “; panel TMB values for which calibrated WES TMB estimates and intervals of uncertainty will be calculated. Note that”Panel.TMB" values may be real or hypothetical panel TMB values at which WES-calibrated estimates and intervals of uncertainty are desired. Typical hypothetical TMB values of interest would be 5, 10, 15, and 20. The file must contain at least one Panel.TMB value.

An example of this required file is as follows:

Sample.ID   Panel.TMB
NewS0001    2.41
NewS0002    25.54
NewS0003    3.06
NewS0004    26.67
NewS0005    37.38
......

Calib.Interval Numeric value

Specifies the level of uncertainty of the calibrated interval. For example, Calib.Interval = 95 means that 95% prediction limits will be projected onto the WES TMB axis to derive 95% calibration intervals of uncertainty. The inputted value must be a number between 1 and 100. Common values are 90, 95, and 99. Default is 95.

trunc.neg.flag Character string

If value is “NEG”, the code allows negative values of panel-based TMB to be used in the regression modeling. Although TMB values cannot actually be negative, some laboratories apply correction factors to panel TMB values that could cause small values to become negative. Negative values are not a problem for fitting the regression line, but they must be set to zero for purposes of variance and prediction limit calculation. Nonetheless, WES-calibrated estimates of TMB will always be truncated at zero so that they are not reported as negative values.

If “TRUNCtoZERO”, the code truncates negative panel-based TMB values to zero prior to their use in regression modeling.

dir.output Character string

This argument can be used to specify the directory output file path. Default is <user’s home directory>/tmb.

dir.result Character string

This argument can be used to specify the folder name in which the results will be placed. This folder will be found within the folder “dir.output”. Default is “tmbLab_res”.

show.HTML Logical

If TRUE, the HTML summary document will automatically pop up in the system defaultbrowser. Default is TRUE. A copy of this output is also saved for later viewing (see tmbLab.html output file).

Value

The output list includes the following elements:

  • Obs.PANEL.TMB.vec: observed panel-based TMB values.
  • Obs.PANEL.IDs.vec: observed panel-based TMB sample IDs.
  • PANEL.vec: the panel names which were provided in the file file.Model.Data.TMB.
  • Calib.Interval: the calibrated interval level of uncertainty.
  • trunc.neg.flag: the option for allowing negative values of panel-based TMB.
  • dir.resPath: the file path for the final result files.
  • dir.output: the file path for the output files.
  • predMethod: the calculation method: 1 for “tmbLab”; 2 for “tmbLabWES”.
  • Lab.All: a data frame including the WES-calibrated estimated TMB with intervals of uncertainty.

The following output files are generated:

  • tmbLab.html
  • Panel.n.GLS.ML.scatterplot.pdf
  • All.GLS.ML.RegressionParameters.txt
  • tmbLab.All.GLS.ML.FIT.TMB.txt
  • tmbLab.All.GLS.ML.CALIB.txt
  • All.GLS.ML.RegressionCurveFit.txt,
  • All.GLS.ML.RegressionCurveFit.Grid.txt
  • All.GLS.ML.LowerLimitPredictionCurve.txt,
  • All.GLS.ML.LowerLimitPredictionCurve.Grid.txt,
  • All.GLS.ML.UpperLimitPredictionCurve.txt,
  • All.GLS.ML.UpperLimitPredictionCurve.Grid.txt

tmbLabWES()

Description

This function estimates, from user-supplied data, a calibration curve with corresponding prediction limits to quantify the average relationship between WES and panel TMB assay results as well as variability around that curve. For a set of input WES values, this function also provides estimates of laboratory-specific panel diagnostic TMB and prediction limits for the laboratory-specific panel diagnostic assay. This function differs from the all-encompassing function tmbLab() in that it estimates the regression line and prediction limits, but does not further provide estimates of laboratory-specific TMB that are calibrated to WES TMB.

Maximum likelihood methods implemented in the gls() function in the R package “nlme” are used to fit a weighted least squares linear regression model. This model assumes a linear mean structure, Gaussian errors, and power variance structure. Users are advised to check the reasonableness of these assumptions for their own data.

This function requires two input files. The first file, designated below as “file.Model.Data.TMB,” is a tab-delimited text file containing training data consisting of paired TMB values, one from a whole exome sequencing (WES) assay and one from a laboratory-specific panel diagnostic assay, for each of a series of samples. The second is a tab-delimited text file that contains WES values at which to output predicted laboratory-specific panel TMB values along with their corresponding prediction limits. Further details of the required formats for these files are given below under the “Arguments” section.

To better understand the model-fitting process, consider Figure 2. The first input file, which contains the training data, will correspond to the points depicted in the scatter plot with x-axis and y-axis corresponding to the WES TMB values (“Uniform.WES.TMB”) and laboratory-specific panel TMB values (“Panel.TMB”), respectively. This first data input file will include the information necessary to generate the regression line and prediction limits, denoted by the solid black line and the dotted black lines, respectively. The second input file provides a set of x0 values, which are real or hypothetical WES values at which estimated laboratory-specific panel TMB values and prediction intervals are desired. In Figure 2, a single x0 input value is depicted on the x-axis with a yellow vertical line. The laboratory-based panel TMB estimate pertaining to x0 is derived as the predicted mean value on the regression line and is depicted on the y-axis as y0. In addition, the prediction interval is provided for x0 as depicted by the horizontal dashed lines.

Prior to regression fitting, this function drops observations (samples) from the training data supplied via the input file file.Model.Data.TMB with WES TMB values that are greater than 40. These observations are dropped from the analysis because data collected as part of phase 2 of the FOCR TMB harmonization project [Vega DM 2021] we insufficient to assess modeling assumptions, particularly linearity, at these high TMB values. The source code is made available so that individual laboratories may modify this cutoff, if desired.

Usage

tmbLabWES(
  file.Model.Data.TMB,
  file.WES.Panel.TMB,
  Pred.Interval = 95,
  trunc.neg.flag = c("NEG", "TRUNCtoZERO"),
  dir.output,
  dir.result = "tmbLabWES_res",
  show.HTML = TRUE
)

Arguments

file.Model.Data.TMB Character string

This required file describes the laboratory-specific panel TMB training data file name and must be a tab-delimited text file with 3 or more columns:

Column 1: “Sample.ID”; the unique sample identifiers.

Column 2: “Uniform.WES.TMB”; the training set WES values.

Column 3 and more: “Panel.1”, “Panel.2,” etc. denoting one or more laboratory-specific panel TMB columns. It is anticipated that most users will provide only one set of laboratory-specific panel TMB data, corresponding to “Panel.1,” however, the option is given to include data from multiple laboratory-specific panels by specifying additional columns.

Note also that the model implemented in this package assumes that samples are independent. Inputting multiple replicate TMB measurements from the same sample will not lead to appropriate statistical inference (e.g., prediction limits will be incorrect).

An example of this required file is as follows:

Sample.ID   Uniform.WES.TMB   Panel.1   Panel.2
S0000001    1.3410            1.0132    0.9345
S0000002    1.6414            3.4837    2.0914
S0000003    0.7608            0.3165    0.2429
S0000004    0.8482            1.5107    0.4991
S0000005    0.8301            3.4755    1.8225
......

file.WES.Panel.TMB Character string

This required file describes the name of the data file that contains WES TMB values at which panel TMB predicted values along with prediction limits will be calculated. The file must be a tab-delimited text file with 2 columns:

Column 1: “Sample.ID”; the unique sample identifiers.

Column 2: " WES.TMB “; WES TMB values for which predicted panel TMB and prediction limits will be calculated. Note that”WES.TMB" values may be real or hypothetical WES TMB values at which predicted panel TMB and prediction limits are desired. Typical hypothetical WES values of interest would be 5, 10, 15, and 20. The file must contain at least one WES.TMB value.

An example of this required file is as follows:

Sample.ID   WES.TMB
NewS0001    5
NewS0002    10
NewS0003    15
NewS0004    17
NewS0005    20
......

Pred.Interval Numeric value

Specifies the prediction interval level of uncertainty. For example, Pred.Interval = 95 means that 95% of the observed values of panel TMB for a sample with the designated WES.TMB are predicted to fall within the interval. The inputted value must be a number between 1 and 100. Common values are 90, 95, and 99. Default is 95.

trunc.neg.flag Character string

If value is “NEG”, the code allows negative values of panel-based TMB to be used in the regression modeling. Although TMB values cannot actually be negative, some laboratories apply correction factors to panel TMB values that could cause small values to become negative. Negative values are not a problem for fitting the regression line, but they must be set to zero for purposes of variance and prediction limit calculation. Nonetheless, WES-calibrated estimates of TMB will always be truncated at zero so that they are not reported as negative values.

If “TRUNCtoZERO”, the code truncates negative panel-based TMB values to zero prior to their use in regression modeling.

dir.output Character string

This argument can be used to specify the directory output file path. Default is <user’s home directory>/tmb.

dir.result Character string

This argument can be used to specify the folder name in which the results will be placed. This folder will be found within the folder “dir.output”. Default is “tmbLab_res”.

show.HTML Logical

If TRUE, the HTML summary document will automatically pop up in the system defaultbrowser. Default is TRUE. A copy of this output is also saved for later viewing (see tmbLabWES.html output file).

Value

The output list includes the following elements:

  • Obs.PANEL.TMB.vec: observed panel-based TMB values.
  • Obs.PANEL.IDs.vec: observed panel-based TMB sample IDs.
  • PANEL.vec: the panel names which were provided in the file file.Model.Data.TMB.
  • Pred.Interval: the prediction interval level of uncertainty.
  • trunc.neg.flag: the option for allowing negative values of panel-based TMB.
  • dir.resPath: the path for the final result files.
  • dir.output: the path for the output files.
  • predMethod: the calculation method: 1 for “tmbLab”; 2 for “tmbLabWES”.
  • Lab.All: a data frame including predicted assay TMB value with prediction intervals.

The following output files are generated:

  • tmbLabWES.html
  • Panel.n.GLS.ML.scatterplot.pdf
  • All.GLS.ML.RegressionParameters.txt
  • tmbLab.All.GLS.ML.PRED.txt
  • All.GLS.ML.RegressionCurveFit.txt,
  • All.GLS.ML.RegressionCurveFit.Grid.txt
  • All.GLS.ML.LowerLimitPredictionCurve.txt,
  • All.GLS.ML.LowerLimitPredictionCurve.Grid.txt,
  • All.GLS.ML.UpperLimitPredictionCurve.txt,
  • All.GLS.ML.UpperLimitPredictionCurve.Grid.txt

References

Vega DM, Yee LM, McShane LM, et al. Aligning Tumor Mutational Burden (TMB) quantification across diagnostic platforms: Phase 2 of the Friends of Cancer Research TMB Harmonization Project. Submitted to Annals of Oncology 2021.

  • Posted:

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “Aligning Tumor Mutational Burden (TMB) across diagnostic platforms was originally published by the National Cancer Institute.”

Email