Installation
Once risks is released on CRAN,
install.packages("risks")
will be available. Currently, the
development version of risks
can be installed from GitHub using:
# install.packages("remotes") # The "remotes" package needs to be installed
remotes::install_github("stopsack/risks")
An example cohort study
We use a cohort of women with breast cancer, as used by Spiegelman
and Hertzmark (Am J
Epidemiol 2005) and Greenland (Am J Epidemiol
2004). The the categorical exposure is stage
, the
binary outcome is death
, and the binary confounder is
receptor
.
library(risks) # provides riskratio(), riskdiff(), postestimation functions
library(dplyr) # For data handling
library(broom) # For tidy() model summaries
data(breastcancer) # Load sample data
breastcancer %>% # Display the sample data
group_by(receptor, stage) %>%
summarize(deaths = sum(death), total = n(), risk = deaths/total)
#> # A tibble: 6 × 5
#> # Groups: receptor [2]
#> receptor stage deaths total risk
#> <chr> <fct> <dbl> <int> <dbl>
#> 1 High Stage I 5 55 0.0909
#> 2 High Stage II 17 74 0.230
#> 3 High Stage III 9 15 0.6
#> 4 Low Stage I 2 12 0.167
#> 5 Low Stage II 9 22 0.409
#> 6 Low Stage III 12 14 0.857
Adjusted risk ratios
The risk of death is higher among women with higher-stage and hormone
receptor-low cancers, which also tend to be of higher stage. Using
risks
models to obtain (possibly multivariable-adjusted)
risk ratios or risk differences is similar to the standard code for
logistic models in R. As customary in R, log(RR) is returned; see below
for how to obtain exponentiated values. No options for model
family
or link
need to be supplied:
fit_rr <- riskratio(formula = death ~ stage + receptor, data = breastcancer)
summary(fit_rr)
#>
#> Risk ratio model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> stageStage I 0.0000 0.0000 NaN NaN
#> stageStage II 0.8989 0.3875 2.320 0.0203 *
#> stageStage III 1.8087 0.3783 4.781 1.75e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (delta method)
#> 2.5 % 97.5 %
#> stageStage I 0.0000000 0.000000
#> stageStage II 0.1395299 1.658324
#> stageStage III 1.0671711 2.550242
Adjusted risk differences
To obtain risk differences, use riskdiff
, which has the
same syntax:
fit_rd <- riskdiff(formula = death ~ stage + receptor, data = breastcancer)
summary(fit_rd)
#>
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"),
#> data = breastcancer, start = "(no starting values)")
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error z value Pr(>|z|)
#> stageStage I 0.00000 0.00000 NaN NaN
#> stageStage II 0.16303 0.05964 2.734 0.00626 **
#> stageStage III 0.57097 0.09962 5.732 9.95e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 228.15 on 191 degrees of freedom
#> Residual deviance: 185.88 on 188 degrees of freedom
#> AIC: 193.88
#>
#> Number of Fisher Scoring iterations: 4
#>
#> Confidence intervals for coefficients: (delta method)
#> 2.5 % 97.5 %
#> stageStage I 0.00000000 0.0000000
#> stageStage II 0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158
For example, the risk of death was 57 percentage points higher in women with stage III breast cancer compared to stage I (95% confidence interval, 38 to 77 percentage points), adjusting for hormone receptor status.
The model summary in riskratio()
and
riskdiff()
includes to two additions compared to a regular
glm()
model:
- In the first line of
summary()
, the type of risk regression model is displayed (in the example, “Risk ratio model, fitted as binomial model...
”). - At the end of the output, confidence intervals for the model coefficients are printed.
Accessing model coefficients
The risks package provides an interface to
broom::tidy()
, which returns a data frame of all
coefficients (risk differences in this example), their standard errors,
z statistics, and confidence intervals.
tidy(fit_rd)
#> # A tibble: 3 × 8
#> term estimate std.error statistic p.value conf.low conf.high model
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 stageStage I 0 0 NaN NaN 0 0 marg…
#> 2 stageStage II 0.163 0.0596 2.73 6.26e-3 0.0461 0.280 marg…
#> 3 stageStage III 0.571 0.0996 5.73 9.95e-9 0.376 0.766 marg…
In accordance with glm()
standards, coefficients for
relative risks are shown on the logarithmic scale. Exponentiated
coefficients for risk ratios are easily obtained via
tidy(..., exponentiate = TRUE)
:
tidy(fit_rr, exponentiate = TRUE)
#> # A tibble: 3 × 8
#> term estimate std.error statistic p.value conf.low conf.high model
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 stageStage I 1 0 NaN NaN 1 1 marg…
#> 2 stageStage II 2.46 0.387 2.32 2.03e-2 1.15 5.25 marg…
#> 3 stageStage III 6.10 0.378 4.78 1.75e-6 2.91 12.8 marg…
For example, the risk of death was 6 times higher in women with stage III breast cancer compared to stage I (95% confidence interval, 3 to 13 times), adjusting for hormone receptor status.
More post-estimation commands
Typical R functions that build on regression models can further
process fitted risks
models. In addition to
tidy()
, examples are:
-
coef(fit)
orcoefficients(fit)
return model coefficients (i.e., RDs or log(RR)) as a numeric vector -
confint(fit, level = 0.9)
returns 90% confidence intervals. -
predict(fit, type = "response")
orfitted.values(fit)
return predicted probabilities of the binary outcome. -
residuals(fit)
returns model residuals.