Implemented model types
The risks package implements all major regression models that have
been proposed for relative risks and risk differences. By default
(approach = "auto"
), riskratio()
and
riskdiff()
estimate the most efficient valid model that
converges; in more numerically challenging cases, they default to
marginal standardization, which does not return parameters for
covariates.
The following models are implemented in the risks package:
#1 | approach = |
RR | RD | Model | Reference |
---|---|---|---|---|---|
1 | glm |
riskratio |
riskdiff |
Binomial model with a log or identity link | Wacholder S. Binomial regression in GLIM: Estimating risk ratios and risk differences. Am J Epidemiol 1986;123:174-184. |
2 | glm_startp |
riskratio |
riskdiff |
Binomial model with a log or identity link, convergence-assisted by starting values from Poisson model | Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 2005;162:199-200. |
3 | margstd_delta |
riskratio |
riskdiff |
Marginally standardized estimates using binomial models with a logit link (logistic model) with standard errors calculated via the delta method. | This package. |
4 | margstd_boot |
riskratio |
riskdiff |
Marginally standardized estimates using binomial models with a logit link (logistic model) with bias-corrected accelerated (BCa) confidence intervals from parametric bootstrapping (see Marginal standardization). | This package. For marginal standardization with nonparametric bootstrapping, see: Localio AR, Margolis DJ, Berlin JA. Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression. J Clin Epidemiol 2007;60(9):874-82. |
– | glm_cem |
riskratio |
— | Binomial model with log-link fitted via combinatorial expectation maximization instead of Fisher scoring | Donoghoe MW, Marschner IC. logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model. J Stat Softw 2018;86(9). |
– | glm_cem |
— | riskdiff |
Additive binomial model (identity link) fitted via combinatorial expectation maximization instead of Fisher scoring | Donoghoe MW, Marschner IC. Stable computational methods for additive binomial models with application to adjusted risk differences. Comput Stat Data Anal 2014;80:184-96. |
– | robpoisson |
riskratio |
riskdiff |
Log-linear (Poisson) model with robust/sandwich/empirical standard errors | Zou G. A modified Poisson regression approach to prospective studies with binary data. Am J Epidemiol 2004;159(7):702-6 |
– | duplicate |
riskratio |
– | Case-duplication approach, fitting a logistic model with cluster-robust standard errors | Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC, Pool J, Vandenbroucke JP. Risk ratio and rate ratio estimation in case-cohort designs: hypertension and cardiovascular mortality. Stat Med 1993;12:1733–45. |
– | glm_startd |
riskratio |
— | Binomial model with a log link, convergence-assisted by starting values from case-duplication logistic model | This package. |
– | logistic |
riskratio , for comparison only |
— | Binomial model with logit link (i.e., the logistic model), returning odds ratios | Included for comparison purposes only. |
1 Indicates the priority with which the legacy modelling
strategy (approach = "legacy"
) attempts model fitting
(glm_startp
: only for RR).
Which model was fitted is always indicated in the first line of the
output of summary()
and in the model
column of
tidy()
. In methods sections of manuscripts, the approach
can be described in detail as follows: “Risk ratios (or risk
differences) were obtained via (method listed in the first line of model
summary.risks(...)
) using the risks
R package
(reference to this package and/or the article listed in the column
‘reference’).”
For example: “Risk ratios were obtained from binomial models with a
log link, convergence-assisted by Poisson models (ref. Spiegelman and
Hertzmark, AJE 2005), using the risks
R package (https://stopsack.github.io/risks/).”
Model choice
By default, automatic model fitting (approach = "auto"
)
reports results from marginal standardization using a logistic model
with delta method standard errors (equivalent to
approach = "margstd_delta"
). An exception is made if
interaction terms between exposure and confounders are included. This
case, confidence intervals are calculated using bootstrapping
(equivalent to requesting approach = "margstd_boot"
).
Alternatively, any of the options listed under approach =
in the table can be requested directly. However, unlike with
approach = "auto"
(the default) or
approach = "legacy"
, the selected model may not
converge.
We load the same example data as in the Get Started vignette.
library(risks) # provides riskratio(), riskdiff(), postestimation functions
library(dplyr) # For data handling
library(broom) # For tidy() model summaries
data(breastcancer)
We then select a binomial model with starting values from the Poisson model:
riskratio(formula = death ~ stage + receptor,
data = breastcancer,
approach = "glm_startp")
#>
#> Risk ratio model
#> Call: stats::glm(formula = death ~ stage + receptor, family = stats::binomial(link = "log"),
#> data = breastcancer, start = "(from Poisson model)")
#>
#> Coefficients:
#> (Intercept) stageStage II stageStage III receptorLow
#> -2.3521 0.9314 1.7695 0.4436
#>
#> Degrees of Freedom: 191 Total (i.e. Null); 188 Residual
#> Null Deviance: 228.1
#> Residual Deviance: 185.9 AIC: 193.9
However, the binomial model without starting values
(approach = "glm"
) does not converge, as expected.
Model comparisons
With approach = "all"
, all model types listed in the
tables are fitted. The fitted object, e.g., fit
,
is one of the converged models. A summary of the convergence status of
all models is displayed at the beginning of
summary(fit)
:
fit_all <- riskratio(formula = death ~ stage + receptor,
data = breastcancer,
approach = "all")
#> Loading required package: doParallel
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
#> Loading required package: numDeriv
#> Loading required package: quantreg
#> Loading required package: SparseM
#>
#> Attaching package: 'SparseM'
#> The following object is masked from 'package:base':
#>
#> backsolve
#>
#> Attaching package: 'turboEM'
#> The following objects are masked from 'package:numDeriv':
#>
#> grad, hessian
summary(fit_all)
#>
#> All fitted models:
#> Model Converged Max.prob.
#> 1 robpoisson TRUE 0.9052473
#> 2 glm FALSE NA
#> 3 glm_startp TRUE 0.8702313
#> 4 logbin TRUE 0.8702292
#> 5 logbin_start TRUE 0.8702294
#> 6 margstd_boot TRUE 0.8158560
#> 7 margstd_delta TRUE 0.8158560
#> 8 logistic TRUE 0.8158560
#> 9 duplicate TRUE 0.4785245
#> 10 glm_startd TRUE 0.8702318
#>
#> Risk ratio model, fitted as Poisson model with robust covariance (robpoisson).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = poisson(link = "log"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -2.3658 0.3602 -6.569 5.08e-11 ***
#> stageStage II 0.9246 0.3932 2.351 0.0187 *
#> stageStage III 1.7772 0.3855 4.610 4.03e-06 ***
#> receptorLow 0.4891 0.2129 2.297 0.0216 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 137.00 on 191 degrees of freedom
#> Residual deviance: 110.29 on 188 degrees of freedom
#> AIC: 226.29
#>
#> Number of Fisher Scoring iterations: 6
#>
#> Confidence intervals for coefficients: (robust)
#> 2.5 % 97.5 %
#> (Intercept) -3.07172916 -1.6599084
#> stageStage II 0.15386831 1.6952382
#> stageStage III 1.02161187 2.5328206
#> receptorLow 0.07183369 0.9062773
Individual models can be accessed as fit$all_models[[1]]
through fit$all_models[[6]]
(or [[7]]
if
fitting a risk ratio model). tidy()
shows coefficients and
confidence intervals from all models that converged:
tidy(fit_all) %>%
select(-statistic, -p.value) %>%
print(n = 50)
#> # A tibble: 34 × 6
#> term estimate std.error conf.low conf.high model
#> <chr> <dbl[1d]> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) -2.37 0.360 -3.07 -1.66 robpoisson
#> 2 stageStage II 0.925 0.393 0.154 1.70 robpoisson
#> 3 stageStage III 1.78 0.386 1.02 2.53 robpoisson
#> 4 receptorLow 0.489 0.213 0.0718 0.906 robpoisson
#> 5 (Intercept) -2.35 0.360 -3.06 -1.65 glm_startp
#> 6 stageStage II 0.931 0.393 0.161 1.70 glm_startp
#> 7 stageStage III 1.77 0.385 1.01 2.52 glm_startp
#> 8 receptorLow 0.444 0.197 0.0578 0.829 glm_startp
#> 9 (Intercept) -2.35 0.360 -3.06 -1.65 logbin
#> 10 stageStage II 0.931 0.393 0.161 1.70 logbin
#> 11 stageStage III 1.77 0.385 1.01 2.52 logbin
#> 12 receptorLow 0.444 0.197 0.0578 0.829 logbin
#> 13 (Intercept) -2.35 0.360 -3.06 -1.65 logbin_start
#> 14 stageStage II 0.931 0.393 0.161 1.70 logbin_start
#> 15 stageStage III 1.77 0.385 1.01 2.52 logbin_start
#> 16 receptorLow 0.444 0.197 0.0578 0.829 logbin_start
#> 17 stageStage I 0 0 NA NA margstd_boot
#> 18 stageStage II 0.899 0.444 0.198 1.77 margstd_boot
#> 19 stageStage III 1.81 0.425 1.01 2.60 margstd_boot
#> 20 stageStage I 0 0 0 0 margstd_delta
#> 21 stageStage II 0.899 0.387 0.140 1.66 margstd_delta
#> 22 stageStage III 1.81 0.378 1.07 2.55 margstd_delta
#> 23 (Intercept) -2.37 0.420 -3.19 -1.54 logistic
#> 24 stageStage II 1.13 0.466 0.222 2.05 logistic
#> 25 stageStage III 2.94 0.586 1.79 4.08 logistic
#> 26 receptorLow 0.920 0.395 0.145 1.69 logistic
#> 27 (Intercept) -2.37 0.362 -3.08 -1.66 duplicate
#> 28 stageStage II 0.923 0.395 0.149 1.70 duplicate
#> 29 stageStage III 1.78 0.385 1.03 2.54 duplicate
#> 30 receptorLow 0.504 0.224 0.0647 0.944 duplicate
#> 31 (Intercept) -2.35 0.360 -3.06 -1.65 glm_startd
#> 32 stageStage II 0.931 0.393 0.161 1.70 glm_startd
#> 33 stageStage III 1.77 0.385 1.01 2.52 glm_startd
#> 34 receptorLow 0.444 0.197 0.0578 0.829 glm_startd