Skip to contents

Installation

Once risks is released on CRAN, install.packages("risks") will be available. Currently, the development version of risks can be installed from GitHub using:

# install.packages("remotes")  # The "remotes" package needs to be installed
remotes::install_github("stopsack/risks")

An example cohort study

We use a cohort of women with breast cancer, as used by Spiegelman and Hertzmark (Am J Epidemiol 2005) and Greenland (Am J Epidemiol 2004). The the categorical exposure is stage, the binary outcome is death, and the binary confounder is receptor.

library(risks)  # provides riskratio(), riskdiff(), postestimation functions
library(dplyr)  # For data handling
library(broom)  # For tidy() model summaries

data(breastcancer)  # Load sample data

breastcancer %>%  # Display the sample data
  group_by(receptor, stage) %>% 
  summarize(deaths = sum(death), total = n(), risk = deaths/total)
#> # A tibble: 6 × 5
#> # Groups:   receptor [2]
#>   receptor stage     deaths total   risk
#>   <chr>    <fct>      <dbl> <int>  <dbl>
#> 1 High     Stage I        5    55 0.0909
#> 2 High     Stage II      17    74 0.230 
#> 3 High     Stage III      9    15 0.6   
#> 4 Low      Stage I        2    12 0.167 
#> 5 Low      Stage II       9    22 0.409 
#> 6 Low      Stage III     12    14 0.857

Adjusted risk ratios

The risk of death is higher among women with higher-stage and hormone receptor-low cancers, which also tend to be of higher stage. Using risks models to obtain (possibly multivariable-adjusted) risk ratios or risk differences is similar to the standard code for logistic models in R. As customary in R, log(RR) is returned; see below for how to obtain exponentiated values. No options for model family or link need to be supplied:

fit_rr <- riskratio(formula = death ~ stage + receptor, data = breastcancer)
summary(fit_rr)
#> 
#> Risk ratio model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I     0.0000     0.0000     NaN      NaN    
#> stageStage II    0.8989     0.3875   2.320   0.0203 *  
#> stageStage III   1.8087     0.3783   4.781 1.75e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                    2.5 %   97.5 %
#> stageStage I   0.0000000 0.000000
#> stageStage II  0.1395299 1.658324
#> stageStage III 1.0671711 2.550242

Adjusted risk differences

To obtain risk differences, use riskdiff, which has the same syntax:

fit_rd <- riskdiff(formula = death ~ stage + receptor, data = breastcancer)
summary(fit_rd)
#> 
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I    0.00000    0.00000     NaN      NaN    
#> stageStage II   0.16303    0.05964   2.734  0.00626 ** 
#> stageStage III  0.57097    0.09962   5.732 9.95e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                     2.5 %    97.5 %
#> stageStage I   0.00000000 0.0000000
#> stageStage II  0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158

For example, the risk of death was 57 percentage points higher in women with stage III breast cancer compared to stage I (95% confidence interval, 38 to 77 percentage points), adjusting for hormone receptor status.

The model summary in riskratio() and riskdiff() includes to two additions compared to a regular glm() model:

  • In the first line of summary(), the type of risk regression model is displayed (in the example, “Risk ratio model, fitted as binomial model...”).
  • At the end of the output, confidence intervals for the model coefficients are printed.

Accessing model coefficients

The risks package provides an interface to broom::tidy(), which returns a data frame of all coefficients (risk differences in this example), their standard errors, z statistics, and confidence intervals.

tidy(fit_rd)
#> # A tibble: 3 × 8
#>   term           estimate std.error statistic   p.value conf.low conf.high model
#>   <chr>             <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#> 1 stageStage I      0        0         NaN    NaN         0          0     marg…
#> 2 stageStage II     0.163    0.0596      2.73   6.26e-3   0.0461     0.280 marg…
#> 3 stageStage III    0.571    0.0996      5.73   9.95e-9   0.376      0.766 marg…


In accordance with glm() standards, coefficients for relative risks are shown on the logarithmic scale. Exponentiated coefficients for risk ratios are easily obtained via tidy(..., exponentiate = TRUE):

tidy(fit_rr, exponentiate = TRUE)
#> # A tibble: 3 × 8
#>   term           estimate std.error statistic   p.value conf.low conf.high model
#>   <chr>             <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#> 1 stageStage I       1        0        NaN    NaN           1         1    marg…
#> 2 stageStage II      2.46     0.387      2.32   2.03e-2     1.15      5.25 marg…
#> 3 stageStage III     6.10     0.378      4.78   1.75e-6     2.91     12.8  marg…

For example, the risk of death was 6 times higher in women with stage III breast cancer compared to stage I (95% confidence interval, 3 to 13 times), adjusting for hormone receptor status.

More post-estimation commands

Typical R functions that build on regression models can further process fitted risks models. In addition to tidy(), examples are:

  • coef(fit) or coefficients(fit) return model coefficients (i.e., RDs or log(RR)) as a numeric vector
  • confint(fit, level = 0.9) returns 90% confidence intervals.
  • predict(fit, type = "response") or fitted.values(fit) return predicted probabilities of the binary outcome.
  • residuals(fit) returns model residuals.