adjust_batch
generates biomarker levels for the variable(s)
markers
in the dataset data
that are corrected
(adjusted) for batch effects, i.e. differential measurement
error between levels of batch
.
Data set
Variable name(s) to batch-adjust. Select
multiple variables with tidy evaluation, e.g.,
markers = starts_with("biomarker")
.
Categorical variable indicating batch.
Method for batch effect correction:
simple
Simple means per batch will be subtracted.
No adjustment for confounders.
standardize
Means per batch after standardization
for confounders in linear models will be subtracted.
If no confounders
are supplied, method = simple
is equivalent and will be used.
ipw
Means per batch after inverse-probability
weighting for assignment to a specific batch in multinomial
models, conditional on confounders, will be subtracted.
Stabilized weights are used, truncated at quantiles
defined by the ipw_truncate
parameters. If no
confounders
are supplied, method = simple
is equivalent and will be used.
quantreg
Lower quantiles (default: 25th percentile)
and ranges between a lower and an upper quantile (default: 75th
percentile) will be unified between batches, allowing for
differences in both parameters due to confounders. Set the two
quantiles using the quantreg_tau
parameters.
quantnorm
Quantile normalization between batches. No
adjustment for confounders.
Optional: Confounders, i.e. determinants of
biomarker levels that differ between batches. Only used if
method = standardize
, method = ipw
, or
method = quantreg
, i.e. methods that attempt to retain
some of these "true" between-batch differences. Select multiple
confounders with tidy evaluation, e.g.,
confounders = c(age, age_squared, sex)
.
Optional: What string to append to variable names
after batch adjustment. Defaults to "_adjX"
, with
X
depending on method
:
_adj2
from method = simple
_adj3
from method = standardize
_adj4
from method = ipw
_adj5
from method = quantreg
_adj6
from method = quantnorm
Optional and used for method = ipw
only:
Lower and upper quantiles for truncation of stabilized
weights. Defaults to c(0.025, 0.975)
.
Optional and used for method = quantreg
only:
Quantiles to scale. Defaults to c(0.25, 0.75)
.
Optional and used for method = quantreg
only:
Algorithmic method to fit quantile regression. Defaults to
"fn"
. See parameter method
of rq
.
The data
dataset with batch effect-adjusted
variable(s) added at the end. Model diagnostics, using
the attribute .batchtma
of this dataset, are available
via the diagnose_models
function.
If no true differences between batches are expected, because
samples have been randomized to batches, then a method
that returns adjusted values with equal means
(method = simple
) or with equal rank values
(method = quantnorm
) for all batches is appropriate.
If the distribution of determinants of biomarker values
(confounders
) differs between batches, then a
method
that retains these "true" differences
between batches while adjusting for batch effects
may be appropriate: method = standardize
and
method = ipw
address means; method = quantreg
addresses lower values and dynamic range separately.
Which method
to choose depends on the properties of
batch effects (affecting means or also variance?) and
the presence and strength of confounding. For the two
mean-only confounder-adjusted methods, the choice may depend
on whether the confounder--batch association (method = ipw
)
or the confounder--biomarker association
(method = standardize
) can be modeled better.
Generally, if batch effects are present, any adjustment
method tends to perform better than no adjustment in
reducing bias and increasing between-study reproducibility.
See references.
All adjustment approaches except method = quantnorm
are based on linear models. It is recommended that variables
for markers
and confounders
first be transformed
as necessary (e.g., log
transformations or
splines
). Scaling or mean centering are not necessary,
and adjusted values are returned on the original scale.
Parameters markers
, batch
, and confounders
support tidy evaluation.
Observations with missing values for the markers
and
confounders
will be ignored in the estimation of adjustment
parameters, as are empty batches. Batch effect-adjusted values
for observations with existing marker values but missing
confounders are based on adjustment parameters derived from the
other observations in a batch with non-missing confounders.
Stopsack KH, Tyekucheva S, Wang M, Gerke TA, Vaselkiv JB, Penney KL, Kantoff PW, Finn SP, Fiorentino M, Loda M, Lotan TL, Parmigiani G+, Mucci LA+ (+ equal contribution). Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays. eLife 2021;10:e71265. doi: https://doi.org/10.7554/elife.71265 (This R package, all methods descriptions, and further recommendations.)
Rosner B, Cook N, Portman R, Daniels S, Falkner B.
Determination of blood pressure percentiles in
normal-weight children: some methodological issues.
Am J Epidemiol 2008;167(6):653-66. (Basis for
method = standardize
)
Bolstad BM, Irizarry RA, Åstrand M, Speed TP.
A comparison of normalization methods for high density
oligonucleotide array data based on variance and bias.
Bioinformatics 2003;19:185–193. (method = quantnorm
)
# Data frame with two batches
# Batch 2 has higher values of biomarker and confounder
df <- data.frame(
tma = rep(1:2, times = 10),
biomarker = rep(1:2, times = 10) +
runif(max = 5, n = 20),
confounder = rep(0:1, times = 10) +
runif(max = 10, n = 20)
)
# Adjust for batch effects
# Using simple means, ignoring the confounder:
adjust_batch(
data = df,
markers = biomarker,
batch = tma,
method = simple
)
#> tma biomarker confounder biomarker_adj2
#> 1 1 1.403751 2.8989230 1.960613
#> 2 2 6.171665 7.7838043 5.614803
#> 3 1 4.003804 7.3531960 4.560667
#> 4 2 2.786042 2.9595673 2.229180
#> 5 1 1.036997 9.8053967 1.593860
#> 6 2 4.331967 8.4152153 3.775105
#> 7 1 3.488887 0.5144628 4.045749
#> 8 2 3.448836 6.3021246 2.891974
#> 9 1 4.664410 6.9582388 5.221272
#> 10 2 5.862608 7.8855600 5.305745
#> 11 1 5.373003 0.3123033 5.929866
#> 12 2 2.874703 3.2556253 2.317841
#> 13 1 1.171207 3.0083081 1.728069
#> 14 2 3.601929 7.3646561 3.045066
#> 15 1 3.011641 4.7902455 3.568504
#> 16 2 2.978349 5.3217126 2.421487
#> 17 1 3.017691 7.0643384 3.574553
#> 18 2 2.318307 10.4857658 1.761445
#> 19 1 2.943507 1.8033877 3.500369
#> 20 2 6.877739 3.1689988 6.320877
# Returns data set with new variable "biomarker_adj2"
# Use quantile regression, include the confounder,
# change suffix of returned variable:
adjust_batch(
data = df,
markers = biomarker,
batch = tma,
method = quantreg,
confounders = confounder,
suffix = "_batchadjusted"
)
#> Warning: Returning data frames from `filter()` expressions was deprecated in dplyr
#> 1.0.8.
#> ℹ Please use `if_any()` or `if_all()` instead.
#> ℹ The deprecated feature was likely used in the batchtma package.
#> Please report the issue to the authors.
#> tma biomarker confounder biomarker_batchadjusted
#> 1 1 1.403751 2.8989230 3.095246
#> 2 2 6.171665 7.7838043 4.313937
#> 3 1 4.003804 7.3531960 4.196229
#> 4 2 2.786042 2.9595673 3.043022
#> 5 1 1.036997 9.8053967 2.939945
#> 6 2 4.331967 8.4152153 3.623340
#> 7 1 3.488887 0.5144628 3.978189
#> 8 2 3.448836 6.3021246 3.291825
#> 9 1 4.664410 6.9582388 4.475960
#> 10 2 5.862608 7.8855600 4.197921
#> 11 1 5.373003 0.3123033 4.776011
#> 12 2 2.874703 3.2556253 3.076304
#> 13 1 1.171207 3.0083081 2.996776
#> 14 2 3.601929 7.3646561 3.349294
#> 15 1 3.011641 4.7902455 3.776101
#> 16 2 2.978349 5.3217126 3.115211
#> 17 1 3.017691 7.0643384 3.778663
#> 18 2 2.318307 10.4857658 2.867440
#> 19 1 2.943507 1.8033877 3.747250
#> 20 2 6.877739 3.1689988 4.578988
# Returns data set with new variable "biomarker_batchadjusted"