riskratio
and riskdiff
provide a flexible interface to fitting
risk ratio and risk difference models.
In cohort studies with a binary outcome, risk ratios and risk differences are typically more appropriate to report than odds ratios from logistic regression, yet such models have historically been difficult to implement in standard software.
The risks package selects an efficient way to fit risk ratio or risk difference models successfully, which will converge whenever logistic models converge. Optionally, a specific approach to model fitting can also be requested. Implemented are Poisson models with robust covariance, binomial models, logistic models with case duplication, binomial models aided in convergence by starting values obtained through Poisson models or logistic models with case duplication, binomial models fitted via combinatorial expectation maximization (optionally also with Poisson starting values), and estimates obtained via marginal standardization after logistic regression with bootstrapped or delta method for confidence intervals.
Adjusting for covariates (e.g., confounders) in the model specification
(formula =
) is possible.
Usage
riskratio(
formula,
data,
approach = c("auto", "all", "robpoisson", "duplicate", "glm", "glm_startp",
"glm_startd", "glm_cem", "glm_cem_startp", "margstd_boot", "margstd_delta",
"logistic", "legacy"),
variable = NULL,
at = NULL,
...
)
riskdiff(
formula,
data,
approach = c("auto", "all", "robpoisson", "glm", "glm_startp", "glm_cem",
"glm_cem_startp", "margstd_boot", "margstd_delta", "legacy"),
variable = NULL,
at = NULL,
...
)
Arguments
- formula
A formula object of the form
response ~ predictors
.- data
A
tibble
ordata.frame
object.- approach
Optional: Method for model fitting.
"auto"
(default) is recommended; it will return results of"margstd_delta"
unless interaction terms between exposure and confounders are included. This these cases, results from"margstd_boot"
are returned."all"
will attempt to fit the model via all implemented approaches to allow for comparisons."legacy"
selects the most efficient approach that converges and ensures that predicted probabilities are within range (< 1).
The other options allow for directly selecting a fitting approach, some of which may not converge or yield out-of-range predicted probabilities. See full documentation for details.
"glm"
Binomial model."glm_startp"
Binomial model with starting values from Poisson model."glm_startd"
Binomial model with starting values from logistic model with case duplication."robpoisson"
Poisson model with robust covariance."duplicate"
Logistic model with duplication of cases. Only available inriskratio()
."glm_cem"
Binomial model fitted with combinatorial expectation maximization."glm_cem_startp"
Asglm_cem
, with Poisson starting values."margstd_boot"
Marginal standardization after logistic model, bootstrap standard errors/confidence intervals."margstd_delta"
Marginal standardization after logistic model, delta method standard errors/confidence intervals."logistic"
For comparison only: the logistic model. Only available inriskratio()
.
- variable
Optional: exposure variable to use for marginal standardization. If
variable
is not provided and marginal standardization is attempted, then the first variable in the model is used as the exposure. Levels are determined automatically for variables typeslogical
,character
,factor
and can optionally be supplied viaat =
.- at
Optional: Levels of exposure variable
variable
for marginal standardization.at =
determines the levels at which contrasts of the exposure are to be assessed. The level listed first is used as the reference. Levels must exist in the data for character, factor or ordered factor variables. For numeric variables, levels that do not exist in the data can be interpolations or extrapolations; if levels exceed the extremes of the data (extrapolation), a warning will be displayed.- ...
Optional: Further arguments passed to fitting functions (
glm
,logbin
, oraddreg
).
Value
Fitted model. This object can be passed on to post-processing functions:
summary.risks
: an overview of results (risks-specific S3 methods:summary.robpoisson
,summary.margstd_boot
,summary.margstd_delta
).tidy.risks
: a tibble of coefficients and confidence intervals.
Standard post-processing functions can also be used:
coef
: a vector of coefficients.confint
: a matrix of confidence intervals (risks-specific S3 methods:confint.robpoisson
,confint.margstd_boot
,confint.margstd_delta
).predict.glm(type = "response")
: fitted values (predictions).residuals
: residuals.
If model fitting using all possible approaches was requested via
approach = "all"
, then their results can be retrieved from the
list all_models
in the returned object (e.g.,
fit$all_models[[1]]
, fit$all_models[[2]]
, etc.).
References
Wacholder S. Binomial regression in GLIM: Estimating risk ratios
and risk differences. Am J Epidemiol 1986;123:174-184.
(Binomial regression models; approach = "glm"
)
Spiegelman D, Hertzmark E. Easy SAS Calculations for Risk or
Prevalence Ratios and Differences. Am J Epidemiol 2005;162:199-200.
(Binomial models fitted used starting values from Poisson models;
approach = "glm_start"
)
Zou G. A modified Poisson regression approach to prospective
studies with binary data. Am J Epidemiol 2004;159:702-706.
(Poisson model with robust/sandwich standard errors;
approach = "robpoisson"
)
Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC,
Pool J, Vandenbroucke JP. Risk ratio and rate ratio estimation in
case-cohort designs: hypertension and cardiovascular mortality.
Stat Med 1993;12:1733–45; (Logistic model with case duplication and
cluster-robust standard errors, approach = "duplicate"
).
Donoghoe MW, Marschner IC. logbin: An R Package for
Relative Risk Regression Using the Log-Binomial Model.
J Stat Softw 2018;86(9). (Log-binomial models fitted via combinatorial
expectation maximization; riskratio(approach = "glm_cem")
Donoghoe MW, Marschner IC. Stable computational methods
for additive binomial models with application to adjusted risk differences.
Comput Stat Data Anal 2014;80:184-96. (Additive binomial models
fitted via combinatorial expectation maximization;
riskdiff(approach = "glm_cem")
)
Localio AR, Margolis DJ, Berlin JA.
Relative risks and confidence intervals were easily computed
indirectly from multivariable logistic regression.
J Clin Epidemiol 2007;60(9):874-82. (Marginal standardization after fitting
a logistic model; approach = "margstd_boot"
)
Examples
data(breastcancer) # Cohort study with binary outcome
# See for details: help(breastcancer)
# Risk ratio model
fit_rr <- riskratio(formula = death ~ stage + receptor, data = breastcancer)
fit_rr
#>
#> Risk ratio model
#> Call: stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Coefficients:
#> stageStage I stageStage II stageStage III
#> 0.0000 0.8989 1.8087
#>
#> Degrees of Freedom: 191 Total (i.e. Null); 188 Residual
#> Null Deviance: 228.1
#> Residual Deviance: 185.9 AIC: 193.9
summary(fit_rr)
#>
#> Risk ratio model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> stageStage I 0.0000 0.0000 NaN NaN
#> stageStage II 0.8989 0.3875 2.320 0.0203 *
#> stageStage III 1.8087 0.3783 4.781 1.75e-06 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (delta method)
#> 2.5 % 97.5 %
#> stageStage I 0.0000000 0.000000
#> stageStage II 0.1395299 1.658324
#> stageStage III 1.0671711 2.550242
# Risk difference model
fit_rd <- riskdiff(formula = death ~ stage + receptor, data = breastcancer)
fit_rd
#>
#> Risk difference model
#> Call: stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Coefficients:
#> stageStage I stageStage II stageStage III
#> 0.000 0.163 0.571
#>
#> Degrees of Freedom: 191 Total (i.e. Null); 188 Residual
#> Null Deviance: 228.1
#> Residual Deviance: 185.9 AIC: 193.9
summary(fit_rd)
#>
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> stageStage I 0.00000 0.00000 NaN NaN
#> stageStage II 0.16303 0.05964 2.734 0.00626 **
#> stageStage III 0.57097 0.09962 5.732 9.95e-09 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (delta method)
#> 2.5 % 97.5 %
#> stageStage I 0.00000000 0.0000000
#> stageStage II 0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158