Fit risk ratio and risk difference models

riskratio and riskdiff provide a flexible interface to fitting risk ratio and risk difference models.

In cohort studies with a binary outcome, risk ratios and risk differences are typically more appropriate to report than odds ratios from logistic regression, yet such models have historically been difficult to implement in standard software.

The risks package selects an efficient way to fit risk ratio or risk difference models successfully, which will converge whenever logistic models converge. Optionally, a specific approach to model fitting can also be requested. Implemented are Poisson models with robust covariance, binomial models, logistic models with case duplication, binomial models aided in convergence by starting values obtained through Poisson models or logistic models with case duplication, binomial models fitted via combinatorial expectation maximization (optionally also with Poisson starting values), and estimates obtained via marginal standardization after logistic regression with bootstrapped or delta method for confidence intervals.

Adjusting for covariates (e.g., confounders) in the model specification (formula =) is possible.

Usage

riskratio(
  formula,
  data,
  approach = c("auto", "all", "robpoisson", "duplicate", "glm", "glm_startp",
    "glm_startd", "glm_cem", "glm_cem_startp", "margstd_boot", "margstd_delta",
    "logistic", "legacy"),
  variable = NULL,
  at = NULL,
  ...
)

riskdiff(
  formula,
  data,
  approach = c("auto", "all", "robpoisson", "glm", "glm_startp", "glm_cem",
    "glm_cem_startp", "margstd_boot", "margstd_delta", "legacy"),
  variable = NULL,
  at = NULL,
  ...
)

Arguments

formula

A formula object of the form response ~ predictors.

data

A tibble or data.frame object.

approach

Optional: Method for model fitting.

"auto" (default) is recommended; it will return results of "margstd_delta" unless interaction terms between exposure and confounders are included. This these cases, results from "margstd_boot" are returned.
"all" will attempt to fit the model via all implemented approaches to allow for comparisons.
"legacy" selects the most efficient approach that converges and ensures that predicted probabilities are within range (< 1).

The other options allow for directly selecting a fitting approach, some of which may not converge or yield out-of-range predicted probabilities. See full documentation for details.

"glm" Binomial model.
"glm_startp" Binomial model with starting values from Poisson model.
"glm_startd" Binomial model with starting values from logistic model with case duplication.
"robpoisson" Poisson model with robust covariance.
"duplicate" Logistic model with duplication of cases. Only available in riskratio().
"glm_cem" Binomial model fitted with combinatorial expectation maximization.
"glm_cem_startp" As glm_cem, with Poisson starting values.
"margstd_boot" Marginal standardization after logistic model, bootstrap standard errors/confidence intervals.
"margstd_delta" Marginal standardization after logistic model, delta method standard errors/confidence intervals.
"logistic" For comparison only: the logistic model. Only available in riskratio().

variable

Optional: exposure variable to use for marginal standardization. If variable is not provided and marginal standardization is attempted, then the first variable in the model is used as the exposure. Levels are determined automatically for variables types logical, character, factor and can optionally be supplied via at =.

at

Optional: Levels of exposure variable variable for marginal standardization. at = determines the levels at which contrasts of the exposure are to be assessed. The level listed first is used as the reference. Levels must exist in the data for character, factor or ordered factor variables. For numeric variables, levels that do not exist in the data can be interpolations or extrapolations; if levels exceed the extremes of the data (extrapolation), a warning will be displayed.

...

Optional: Further arguments passed to fitting functions (glm, logbin, or addreg).

Value

Fitted model. This object can be passed on to post-processing functions:

summary.risks: an overview of results (risks-specific S3 methods: summary.robpoisson, summary.margstd_boot, summary.margstd_delta).
tidy.risks: a tibble of coefficients and confidence intervals.

Standard post-processing functions can also be used:

coef: a vector of coefficients.
confint.risks: a matrix of confidence intervals
predict.glm(type = "response"): fitted values (predictions).
residuals: residuals.

If model fitting using all possible approaches was requested via approach = "all", then their results can be retrieved from the list all_models in the returned object (e.g., fit$all_models[[1]], fit$all_models[[2]], etc.).

Functions

riskratio(): Fit risk ratio models
riskdiff(): Fit risk difference models

References

Wacholder S. Binomial regression in GLIM: Estimating risk ratios and risk differences. Am J Epidemiol 1986;123:174-184. (Binomial regression models; approach = "glm")

Spiegelman D, Hertzmark E. Easy SAS Calculations for Risk or Prevalence Ratios and Differences. Am J Epidemiol 2005;162:199-200. (Binomial models fitted used starting values from Poisson models; approach = "glm_start")

Zou G. A modified Poisson regression approach to prospective studies with binary data. Am J Epidemiol 2004;159:702-706. (Poisson model with robust/sandwich standard errors; approach = "robpoisson")

Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC, Pool J, Vandenbroucke JP. Risk ratio and rate ratio estimation in case-cohort designs: hypertension and cardiovascular mortality. Stat Med 1993;12:1733–45; (Logistic model with case duplication and cluster-robust standard errors, approach = "duplicate").

Donoghoe MW, Marschner IC. logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model. J Stat Softw 2018;86(9). (Log-binomial models fitted via combinatorial expectation maximization; riskratio(approach = "glm_cem")

Donoghoe MW, Marschner IC. Stable computational methods for additive binomial models with application to adjusted risk differences. Comput Stat Data Anal 2014;80:184-96. (Additive binomial models fitted via combinatorial expectation maximization; riskdiff(approach = "glm_cem"))

Localio AR, Margolis DJ, Berlin JA. Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression. J Clin Epidemiol 2007;60(9):874-82. (Marginal standardization after fitting a logistic model; approach = "margstd_boot")

Examples

data(breastcancer)  # Cohort study with binary outcome
                    # See for details: help(breastcancer)

# Risk ratio model
fit_rr <- riskratio(formula = death ~ stage + receptor, data = breastcancer)
fit_rr
#> 
#> Risk ratio model
#> Call:  stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients:
#>   stageStage I   stageStage II  stageStage III  
#>         0.0000          0.8989          1.8087  
#> 
#> Degrees of Freedom: 191 Total (i.e. Null);  188 Residual
#> Null Deviance:	    228.1 
#> Residual Deviance: 185.9 	AIC: 193.9
summary(fit_rr)
#> 
#> Risk ratio model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I     0.0000     0.0000     NaN      NaN    
#> stageStage II    0.8989     0.3875   2.320   0.0203 *  
#> stageStage III   1.8087     0.3783   4.781 1.75e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                    2.5 %   97.5 %
#> stageStage I   0.0000000 0.000000
#> stageStage II  0.1395299 1.658324
#> stageStage III 1.0671711 2.550242

# Risk difference model
fit_rd <- riskdiff(formula = death ~ stage + receptor, data = breastcancer)
fit_rd
#> 
#> Risk difference model
#> Call:  stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients:
#>   stageStage I   stageStage II  stageStage III  
#>          0.000           0.163           0.571  
#> 
#> Degrees of Freedom: 191 Total (i.e. Null);  188 Residual
#> Null Deviance:	    228.1 
#> Residual Deviance: 185.9 	AIC: 193.9
summary(fit_rd)
#> 
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I    0.00000    0.00000     NaN      NaN    
#> stageStage II   0.16303    0.05964   2.734  0.00626 ** 
#> stageStage III  0.57097    0.09962   5.732 9.95e-09 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                     2.5 %    97.5 %
#> stageStage I   0.00000000 0.0000000
#> stageStage II  0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158