This function performs sequential, user-defined filter
steps on the input data set. It generates the filtered data and a tibble that
can be directly passed on to exclusion_flowchart to
plot a flowchart of exclusions.
make_exclusions(criteria, data)Tibble with filtering criteria. Must contain three variables:
left String with description of data before applying the filter.
right String with description of data after applying the filter.
filter Filtering expression quoted using expr.
The filter in the last row will not be executed, because the last row
serves as a description of the final data set. See examples.
Tibble with data set on which the filtering criteria should be applied.
A tibble. Each row is a filtering step. Variables:
left: Labels for included subset that is "left" after the filter.
right: Labels for excluded subset (which
exclusion_flowchart plots to the right).
included The data before applying the row's filter.
excluded The data after applying the row's filter.
n_left Number of observations before applying the row's filter.
n_right Number of observations after applying the row's filter.
The last row, included, contains the data after applying all filters.
Access this tibble using %>% pull(included) %>% last().

# Example data set
data(cancer, package = "survival")
cancer <- cancer %>% tibble::as_tibble()
cancer
#> # A tibble: 228 × 10
#> inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3 306 2 74 1 1 90 100 1175 NA
#> 2 3 455 2 68 1 0 90 90 1225 15
#> 3 3 1010 1 56 1 0 90 90 NA 15
#> 4 5 210 2 57 1 1 90 60 1150 11
#> 5 1 883 2 60 1 0 100 90 NA 0
#> 6 12 1022 1 74 1 1 50 80 513 0
#> 7 7 310 2 68 2 2 70 60 384 10
#> 8 11 361 2 71 2 2 60 80 538 1
#> 9 1 218 2 53 1 1 70 80 825 16
#> 10 7 166 2 61 1 2 70 70 271 34
#> # ℹ 218 more rows
# Define exclusion criteria
criteria <- tibble::tribble(
~left, ~right, ~filter,
"All patients", "Missing ECOG status", expr(!is.na(ph.ecog)),
"Known ECOG", "Exclude men", expr(sex == 2),
"Analytical population", "", expr(TRUE))
# Alternative, equivalent approach to defining the criteria
# Note the use of list() around expr(...)
criteria <- dplyr::bind_rows(
tibble::tibble(
left = "All patients",
right = "Missing ECOG status",
filter = list(expr(!is.na(ph.ecog)))),
tibble::tibble(
left = "Known ECOG",
right = "Exclude men",
filter = list(expr(sex == 2))),
tibble::tibble(
left = "Analytical population",
right = "",
filter = list(expr(TRUE))))
# Perform sequential exclusions
result <- make_exclusions(
criteria = criteria,
data = cancer)
# Show results
result
#> # A tibble: 3 × 7
#> left right filter included excluded n_left n_right
#> <chr> <chr> <list> <list> <list> <int> <int>
#> 1 All patients "Missing EC… <language> <tibble> <tibble> 228 1
#> 2 Known ECOG "Exclude me… <language> <tibble> <tibble> 227 137
#> 3 Analytical population "" <lgl [1]> <tibble> <tibble> 90 NA
# Access study population after all exclusions
result %>%
dplyr::pull(included) %>%
dplyr::last()
#> # A tibble: 90 × 10
#> inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 7 310 2 68 2 2 70 60 384 10
#> 2 11 361 2 71 2 2 60 80 538 1
#> 3 16 654 2 68 2 2 70 70 NA 23
#> 4 11 728 2 68 2 1 90 90 NA 5
#> 5 1 61 2 56 2 2 60 60 238 10
#> 6 6 81 2 49 2 0 100 70 1175 -8
#> 7 12 520 2 70 2 1 90 80 825 6
#> 8 12 473 2 69 2 1 90 90 1025 -1
#> 9 16 107 2 60 2 2 50 60 925 -15
#> 10 1 122 2 62 2 2 50 50 1025 NA
#> # ℹ 80 more rows
# Plot flow chart of exclusions (might not display in the online reference)
result %>%
exclusion_flowchart()