Aligning Tumor Mutational Burden (TMB) across diagnostic platforms
A Calibration Tool Developed in Conjunction with Phase 2 of the Friends of Cancer Research TMB Harmonization Project
BRP Software Development Team
2021-03-29
Introduction
This package generates and applies a calibration model as described in phase 2 of the Friends of Cancer Research (FOCR) tumor mutational burden (TMB) harmonization project [Vega DM 2021]. The package requires the user to supply a tab-delimited text file containing training data. These data comprise paired TMB values, one from a whole exome sequencing (WES) assay and one from a laboratory-specific panel diagnostic assay, for each of a series of samples. Using these training data, the software estimates a calibration curve with corresponding prediction limits to quantify the average relationship between WES and panel TMB assay values as well as variability around that curve. The user must also input additional TMB values from the same panel diagnostic assay; the package then provides estimates of TMB that are calibrated to WES TMB using the calibration curve and associated prediction limits calculated from the training data. Intervals of uncertainty are also provided to accompany the corresponding WES-calibrated TMB values. HTML output and plots similar to published results from phase 2 of the FOCR TMB harmonization project are provided to describe the calibration curve and report WES-calibrated laboratory-specific diagnostic panel assay values.
The calibration model is derived by fitting a weighted least squares linear regression model under assumptions of linear mean structure, Gaussian errors, and power variance structure. The software is designed for this situation based on results from phase 2 of the FOCR TMB Project that supported these assumptions. The source code is made available so that individual laboratories may modify it to explore different statistical approaches to developing calibration curves and prediction limits, which could include generalizations such as nonlinear regression and error structures other than Gaussian and power variance.
Installation
After downloading the package to a folder on your computer, you can install the tmbLab R package from the binary or source installer file. In the R console, click “Packages” on the R menu and select “Install package(s) from local files”. In the pop-up window, browse for the file “tmbLab_x.x.x.zip” or “tmbLab_x.x.x.tar.gz” and click “Open”. The installation process will show in the console. If there is no error message, the installation was successful.
Quick Start
Get help about the tmbLab R package
library(tmbLab)
help(package=tmbLab)
Identify the required data
Two data files are required to run the functions tmbLab()
or tmbLabWES()
. Additional information about the data files file.Model.Data.TMB
, file.Obs.Panel.TMB
, and file.WES.Panel.TMB
can be found in the function descriptions for tmbLab()
and tmbLabWES()
below.
file.Model.Data.TMB
is required input for thetmbLab()
or thetmbLabWES()
function. This file describes the laboratory-specific panel TMB training data file name and must be a tab-delimited text file with 3 or more columns:
Column 1: “Sample.ID”; the unique sample identifiers.
Column 2: “Uniform.WES.TMB”; the training set WES values.
Column 3 and more: “Panel.1”, “Panel.2,” etc. denoting one or more laboratory-specific panel TMB columns.file.Obs.Panel.TMB
is required input for thetmbLab()
function. This file describes the name of the data file that contains laboratory-specific panel TMB values at which the function will provide calibrated WES TMB estimates and intervals of uncertainty. The file must be a tab-delimited text file with 2 columns:
Column 1: “Sample.ID”; the unique sample identifiers.
Column 2: " Panel.TMB “; panel TMB values for which calibrated WES TMB estimates and intervals of uncertainty will be calculated. Note that”Panel.TMB" values may be real or hypothetical panel TMB values at which WES-calibrated estimates and intervals of uncertainty are desired.file.WES.Panel.TMB
is required input for thetmbLabWES()
function. This file describes the name of the data file that contains WES TMB values at which panel TMB predicted values along with prediction limits will be calculated. The file must be a tab-delimited text file with 2 columns:
Column 1: “Sample.ID”; the unique sample identifiers.
Column 2: " WES.TMB “; WES TMB values for which predicted panel TMB and prediction limits will be calculated. Note that”WES.TMB" values may be real or hypothetical WES TMB values at which predicted panel TMB and prediction limits are desired.
Run the function tmbLab()
library(tmbLab)
res <- tmbLab(file.Model.Data.TMB =
file.path(path.package("tmbLab"), "extdata/Model.WES.Panel.TMB.txt"),
file.Obs.Panel.TMB =
file.path(path.package("tmbLab"), "extdata/NewSample.Panel.TMB.txt"))
names(res)
[1] "Obs.PANEL.TMB.vec" "Obs.PANEL.IDs.vec" "PANEL.vec" "Calib.Interval" "trunc.neg.flag"
[6] "dir.resPath" "dir.output" "predMethod" "Lab.All"
res$Lab.All
Sample.ID Panel Obs.Panel.TMB CALIB.Est.TMB CALIB.Lower.Lim.TMB CALIB.Upper.Lim.TMB Range.Indicator
1 NewS0001 Panel.1 2.41 1.4473 0.1519 5.5096 In
2 NewS0002 Panel.1 25.54 22.8244 17.3245 29.1249 In
3 NewS0021 Panel.1 52.80 48.0185 0.0000 NA Out
4 NewS0003 Panel.1 3.06 2.0481 0.2527 6.2459 In
5 NewS0004 Panel.1 26.67 23.8688 18.2891 30.2331 In
......
Run the function tmbLabWES()
library(tmbLab)
resWES <- tmbLabWES(file.Model.Data.TMB =
file.path(path.package("tmbLab"), "extdata/Model.WES.Panel.TMB.txt"),
file.WES.Panel.TMB =
file.path(path.package("tmbLab"), "extdata/NewSample.WES.TMB.txt"))
names(resWES)
[1] "Obs.PANEL.TMB.vec" "Obs.PANEL.IDs.vec" "PANEL.vec" "Pred.Interval" "trunc.neg.flag"
[6] "dir.resPath" "dir.output" "predMethod" "Lab.All"
resWES$Lab.All
Sample.ID Panel WES.TMB WES.Est.TMB WES.Lower.Lim.TMB WES.Upper.Lim.TMB Range.Indicator
1 NewS0001 Panel.1 5 6.2561 1.9686 10.5435 In
2 NewS0002 Panel.1 10 11.6682 6.5262 16.8103 In
3 NewS0003 Panel.1 15 17.0804 11.3591 22.8017 In
4 NewS0004 Panel.1 17 19.2452 13.3314 25.1591 In
5 NewS0005 Panel.1 20 22.4925 16.3181 28.6670 In
......
Main functions
tmbLab()
Description
This function estimates, from user-supplied training data, a calibration curve with corresponding prediction limits to quantify the average relationship between WES and panel TMB assay results, as well as variability around that curve. The user must also input additional TMB values from the same panel diagnostic assay, and the package will provide estimates of TMB that are calibrated to WES TMB using the calibration curve and associated prediction limits calculated from the training data. Intervals of uncertainty are also provided to accompany the corresponding WES-calibrated TMB values.
Maximum likelihood methods implemented in the gls() function in the R package “nlme” are used to fit a weighted least squares linear regression model. This model assumes a linear mean structure, Gaussian errors, and power variance structure. Users are advised to check the reasonableness of these assumptions for their own data.
The function requires two input files. The first file, designated below as file.Model.Data.TMB
, is a tab-delimited text file containing training data consisting of paired TMB values, one from a whole exome sequencing (WES) assay and one from a laboratory-specific panel diagnostic assay, for each of a series of samples. The second is a tab-delimited text file that contains laboratory-specific panel TMB values at which to provide WES-calibrated estimates. These laboratory-specific panel TMB values will be the values at which the function will invert the regression line and prediction limits to obtain WES-calibrated estimates and their corresponding intervals of uncertainty. Further details of the required formats for these files are given below under the “Arguments” section.
To better understand the calibration process, consider Figure 1. The first input file, which contains the training data, will correspond to the points depicted in the scatter plot with x-axis and y-axis corresponding to the WES TMB values Uniform.WES.TMB
and laboratory-specific panel TMB values Panel.TMB
, respectively. This first data input file includes all the information necessary to generate the regression line and prediction limits, denoted by the solid black line and the dotted black lines, respectively. The second input file provides a set of y0 values, which are real or hypothetical panel TMB values at which WES-calibrated estimates and intervals of uncertainty are desired. In Figure 1, a single y0 input value is depicted on the y-axis with a yellow horizontal line. The WES-calibrated estimate pertaining to y0 is derived by using the fitted calibration curve and is depicted on the x-axis as x0. In addition, an interval of uncertainty around the WES-calibrated value, (LL95(y0), UL95(y0)), is provided by projecting the prediction limits onto the WES-axis as shown.
Prior to fitting the regression model, this function drops observations (samples) from the training data supplied via the input file.Model.Data.TMB
with WES TMB values that are greater than 40. These observations are dropped from the analysis because data collected as part of phase 2 of the FOCR TMB harmonization project [Vega DM 2021] were insufficient to assess modeling assumptions, particularly linearity, at these high TMB values. The source code is made available so that users may modify this cutoff, if desired.
Usage
tmbLab(
file.Model.Data.TMB,
file.Obs.Panel.TMB,
Calib.Interval = 95,
trunc.neg.flag = c("NEG", "TRUNCtoZERO"),
dir.output,
dir.result = "tmbLab_res",
show.HTML = TRUE
)
Arguments
file.Model.Data.TMB Character string
This required file describes the laboratory-specific panel TMB training data file name and must be a tab-delimited text file with 3 or more columns:
Column 1: “Sample.ID”; the unique sample identifiers.
Column 2: “Uniform.WES.TMB”; the training set WES values.
Column 3 and more: “Panel.1”, “Panel.2,” etc. denoting one or more laboratory-specific panel TMB columns. It is anticipated that most users will provide only one set of laboratory-specific panel TMB data, corresponding to “Panel.1,” however, the option is given to include data from multiple laboratory-specific panels by specifying additional columns.
Note also that the model implemented in this package assumes that samples are independent. Inputting multiple replicate TMB measurements from the same sample will not lead to appropriate statistical inference (e.g., prediction limits will be incorrect).
An example of this required file is as follows:
Sample.ID Uniform.WES.TMB Panel.1 Panel.2
S0000001 1.3410 1.0132 0.9345
S0000002 1.6414 3.4837 2.0914
S0000003 0.7608 0.3165 0.2429
S0000004 0.8482 1.5107 0.4991
S0000005 0.8301 3.4755 1.8225
......
file.Obs.Panel.TMB Character string
This required file describes the name of the data file that contains laboratory-specific panel TMB values at which the function will provide calibrated WES TMB estimates and intervals of uncertainty. The file must be a tab-delimited text file with 2 columns:
Column 1: “Sample.ID”; the unique sample identifiers.
Column 2: " Panel.TMB “; panel TMB values for which calibrated WES TMB estimates and intervals of uncertainty will be calculated. Note that”Panel.TMB" values may be real or hypothetical panel TMB values at which WES-calibrated estimates and intervals of uncertainty are desired. Typical hypothetical TMB values of interest would be 5, 10, 15, and 20. The file must contain at least one Panel.TMB value.
An example of this required file is as follows:
Sample.ID Panel.TMB
NewS0001 2.41
NewS0002 25.54
NewS0003 3.06
NewS0004 26.67
NewS0005 37.38
......
Calib.Interval Numeric value
Specifies the level of uncertainty of the calibrated interval. For example, Calib.Interval = 95 means that 95% prediction limits will be projected onto the WES TMB axis to derive 95% calibration intervals of uncertainty. The inputted value must be a number between 1 and 100. Common values are 90, 95, and 99. Default is 95.
trunc.neg.flag Character string
If value is “NEG”, the code allows negative values of panel-based TMB to be used in the regression modeling. Although TMB values cannot actually be negative, some laboratories apply correction factors to panel TMB values that could cause small values to become negative. Negative values are not a problem for fitting the regression line, but they must be set to zero for purposes of variance and prediction limit calculation. Nonetheless, WES-calibrated estimates of TMB will always be truncated at zero so that they are not reported as negative values.
If “TRUNCtoZERO”, the code truncates negative panel-based TMB values to zero prior to their use in regression modeling.
dir.output Character string
This argument can be used to specify the directory output file path. Default is <user’s home directory>/tmb.
dir.result Character string
This argument can be used to specify the folder name in which the results will be placed. This folder will be found within the folder “dir.output”. Default is “tmbLab_res”.
show.HTML Logical
If TRUE, the HTML summary document will automatically pop up in the system defaultbrowser. Default is TRUE. A copy of this output is also saved for later viewing (see tmbLab.html output file).
Value
The output list includes the following elements:
- Obs.PANEL.TMB.vec: observed panel-based TMB values.
- Obs.PANEL.IDs.vec: observed panel-based TMB sample IDs.
- PANEL.vec: the panel names which were provided in the file file.Model.Data.TMB.
- Calib.Interval: the calibrated interval level of uncertainty.
- trunc.neg.flag: the option for allowing negative values of panel-based TMB.
- dir.resPath: the file path for the final result files.
- dir.output: the file path for the output files.
- predMethod: the calculation method: 1 for “tmbLab”; 2 for “tmbLabWES”.
- Lab.All: a data frame including the WES-calibrated estimated TMB with intervals of uncertainty.
The following output files are generated:
- tmbLab.html
- Panel.n.GLS.ML.scatterplot.pdf
- All.GLS.ML.RegressionParameters.txt
- tmbLab.All.GLS.ML.FIT.TMB.txt
- tmbLab.All.GLS.ML.CALIB.txt
- All.GLS.ML.RegressionCurveFit.txt,
- All.GLS.ML.RegressionCurveFit.Grid.txt
- All.GLS.ML.LowerLimitPredictionCurve.txt,
- All.GLS.ML.LowerLimitPredictionCurve.Grid.txt,
- All.GLS.ML.UpperLimitPredictionCurve.txt,
- All.GLS.ML.UpperLimitPredictionCurve.Grid.txt
tmbLabWES()
Description
This function estimates, from user-supplied data, a calibration curve with corresponding prediction limits to quantify the average relationship between WES and panel TMB assay results as well as variability around that curve. For a set of input WES values, this function also provides estimates of laboratory-specific panel diagnostic TMB and prediction limits for the laboratory-specific panel diagnostic assay. This function differs from the all-encompassing function tmbLab() in that it estimates the regression line and prediction limits, but does not further provide estimates of laboratory-specific TMB that are calibrated to WES TMB.
Maximum likelihood methods implemented in the gls() function in the R package “nlme” are used to fit a weighted least squares linear regression model. This model assumes a linear mean structure, Gaussian errors, and power variance structure. Users are advised to check the reasonableness of these assumptions for their own data.
This function requires two input files. The first file, designated below as “file.Model.Data.TMB,” is a tab-delimited text file containing training data consisting of paired TMB values, one from a whole exome sequencing (WES) assay and one from a laboratory-specific panel diagnostic assay, for each of a series of samples. The second is a tab-delimited text file that contains WES values at which to output predicted laboratory-specific panel TMB values along with their corresponding prediction limits. Further details of the required formats for these files are given below under the “Arguments” section.
To better understand the model-fitting process, consider Figure 2. The first input file, which contains the training data, will correspond to the points depicted in the scatter plot with x-axis and y-axis corresponding to the WES TMB values (“Uniform.WES.TMB”) and laboratory-specific panel TMB values (“Panel.TMB”), respectively. This first data input file will include the information necessary to generate the regression line and prediction limits, denoted by the solid black line and the dotted black lines, respectively. The second input file provides a set of x0 values, which are real or hypothetical WES values at which estimated laboratory-specific panel TMB values and prediction intervals are desired. In Figure 2, a single x0 input value is depicted on the x-axis with a yellow vertical line. The laboratory-based panel TMB estimate pertaining to x0 is derived as the predicted mean value on the regression line and is depicted on the y-axis as y0. In addition, the prediction interval is provided for x0 as depicted by the horizontal dashed lines.
Prior to regression fitting, this function drops observations (samples) from the training data supplied via the input file file.Model.Data.TMB with WES TMB values that are greater than 40. These observations are dropped from the analysis because data collected as part of phase 2 of the FOCR TMB harmonization project [Vega DM 2021] we insufficient to assess modeling assumptions, particularly linearity, at these high TMB values. The source code is made available so that individual laboratories may modify this cutoff, if desired.
Usage
tmbLabWES(
file.Model.Data.TMB,
file.WES.Panel.TMB,
Pred.Interval = 95,
trunc.neg.flag = c("NEG", "TRUNCtoZERO"),
dir.output,
dir.result = "tmbLabWES_res",
show.HTML = TRUE
)
Arguments
file.Model.Data.TMB Character string
This required file describes the laboratory-specific panel TMB training data file name and must be a tab-delimited text file with 3 or more columns:
Column 1: “Sample.ID”; the unique sample identifiers.
Column 2: “Uniform.WES.TMB”; the training set WES values.
Column 3 and more: “Panel.1”, “Panel.2,” etc. denoting one or more laboratory-specific panel TMB columns. It is anticipated that most users will provide only one set of laboratory-specific panel TMB data, corresponding to “Panel.1,” however, the option is given to include data from multiple laboratory-specific panels by specifying additional columns.
Note also that the model implemented in this package assumes that samples are independent. Inputting multiple replicate TMB measurements from the same sample will not lead to appropriate statistical inference (e.g., prediction limits will be incorrect).
An example of this required file is as follows:
Sample.ID Uniform.WES.TMB Panel.1 Panel.2
S0000001 1.3410 1.0132 0.9345
S0000002 1.6414 3.4837 2.0914
S0000003 0.7608 0.3165 0.2429
S0000004 0.8482 1.5107 0.4991
S0000005 0.8301 3.4755 1.8225
......
file.WES.Panel.TMB Character string
This required file describes the name of the data file that contains WES TMB values at which panel TMB predicted values along with prediction limits will be calculated. The file must be a tab-delimited text file with 2 columns:
Column 1: “Sample.ID”; the unique sample identifiers.
Column 2: " WES.TMB “; WES TMB values for which predicted panel TMB and prediction limits will be calculated. Note that”WES.TMB" values may be real or hypothetical WES TMB values at which predicted panel TMB and prediction limits are desired. Typical hypothetical WES values of interest would be 5, 10, 15, and 20. The file must contain at least one WES.TMB value.
An example of this required file is as follows:
Sample.ID WES.TMB
NewS0001 5
NewS0002 10
NewS0003 15
NewS0004 17
NewS0005 20
......
Pred.Interval Numeric value
Specifies the prediction interval level of uncertainty. For example, Pred.Interval = 95 means that 95% of the observed values of panel TMB for a sample with the designated WES.TMB are predicted to fall within the interval. The inputted value must be a number between 1 and 100. Common values are 90, 95, and 99. Default is 95.
trunc.neg.flag Character string
If value is “NEG”, the code allows negative values of panel-based TMB to be used in the regression modeling. Although TMB values cannot actually be negative, some laboratories apply correction factors to panel TMB values that could cause small values to become negative. Negative values are not a problem for fitting the regression line, but they must be set to zero for purposes of variance and prediction limit calculation. Nonetheless, WES-calibrated estimates of TMB will always be truncated at zero so that they are not reported as negative values.
If “TRUNCtoZERO”, the code truncates negative panel-based TMB values to zero prior to their use in regression modeling.
dir.output Character string
This argument can be used to specify the directory output file path. Default is <user’s home directory>/tmb.
dir.result Character string
This argument can be used to specify the folder name in which the results will be placed. This folder will be found within the folder “dir.output”. Default is “tmbLab_res”.
show.HTML Logical
If TRUE, the HTML summary document will automatically pop up in the system defaultbrowser. Default is TRUE. A copy of this output is also saved for later viewing (see tmbLabWES.html output file).
Value
The output list includes the following elements:
- Obs.PANEL.TMB.vec: observed panel-based TMB values.
- Obs.PANEL.IDs.vec: observed panel-based TMB sample IDs.
- PANEL.vec: the panel names which were provided in the file file.Model.Data.TMB.
- Pred.Interval: the prediction interval level of uncertainty.
- trunc.neg.flag: the option for allowing negative values of panel-based TMB.
- dir.resPath: the path for the final result files.
- dir.output: the path for the output files.
- predMethod: the calculation method: 1 for “tmbLab”; 2 for “tmbLabWES”.
- Lab.All: a data frame including predicted assay TMB value with prediction intervals.
The following output files are generated:
- tmbLabWES.html
- Panel.n.GLS.ML.scatterplot.pdf
- All.GLS.ML.RegressionParameters.txt
- tmbLab.All.GLS.ML.PRED.txt
- All.GLS.ML.RegressionCurveFit.txt,
- All.GLS.ML.RegressionCurveFit.Grid.txt
- All.GLS.ML.LowerLimitPredictionCurve.txt,
- All.GLS.ML.LowerLimitPredictionCurve.Grid.txt,
- All.GLS.ML.UpperLimitPredictionCurve.txt,
- All.GLS.ML.UpperLimitPredictionCurve.Grid.txt
References
Vega DM, Yee LM, McShane LM, et al. Aligning Tumor Mutational Burden (TMB) quantification across diagnostic platforms: Phase 2 of the Friends of Cancer Research TMB Harmonization Project. Submitted to Annals of Oncology 2021.