Calculate descriptive summary statistics of all numeric variables in a given dataset. Optionally, this output can be stratified by one or more categorical variable(s).

tsummary(data, ..., by = NULL, na.rm = TRUE)

Arguments

data

Data frame (tibble).

...

Optional. Variables to summarize. If not provided, all numeric variables will be summarized. Supports tidy evaluation; see examples.

by

Optional. Categorical variable(s) to stratify results by.

na.rm

Optional. Drop missing values from summary statatistics? If set to FALSE, summary statistics may be missing in the presence of missing values. Defaults to TRUE.

Value

Tibble, possibly grouped, with the following columns:

  • rows Row count

  • obs Count of non-missing observations

  • distin Count of distinct values

  • min Minimum value

  • q25 25th percentile

  • median Median, 50th percentile

  • q75 75th percentile

  • max Maximum value

  • mean Mean

  • sd Standard deviation

  • sum Sum of all values

Examples

data(mtcars)
mtcars %>%
  tsummary()
#> # A tibble: 11 × 12
#>    variable  rows   obs distin   min    q25 median    q75    max    mean      sd
#>    <chr>    <int> <int>  <int> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>
#>  1 am          32    32      2  0      0      0      1      1      0.406   0.499
#>  2 carb        32    32      6  1      2      2      4      8      2.81    1.62 
#>  3 cyl         32    32      3  4      4      6      8      8      6.19    1.79 
#>  4 disp        32    32     27 71.1  121.   196.   326    472    231.    124.   
#>  5 drat        32    32     22  2.76   3.08   3.70   3.92   4.93   3.60    0.535
#>  6 gear        32    32      3  3      3      4      4      5      3.69    0.738
#>  7 hp          32    32     22 52     96.5  123    180    335    147.     68.6  
#>  8 mpg         32    32     25 10.4   15.4   19.2   22.8   33.9   20.1     6.03 
#>  9 qsec        32    32     30 14.5   16.9   17.7   18.9   22.9   17.8     1.79 
#> 10 vs          32    32      2  0      0      0      1      1      0.438   0.504
#> 11 wt          32    32     29  1.51   2.58   3.32   3.61   5.42   3.22    0.978
#> # ℹ 1 more variable: sum <dbl>

# Select specific variables and
# remove some summary statistics:
mtcars %>%
  tsummary(mpg, cyl, hp, am, gear, carb) %>%
  dplyr::select(-mean, -sd, -sum)
#> # A tibble: 6 × 9
#>   variable  rows   obs distin   min   q25 median   q75   max
#>   <chr>    <int> <int>  <int> <dbl> <dbl>  <dbl> <dbl> <dbl>
#> 1 am          32    32      2   0     0      0     1     1  
#> 2 carb        32    32      6   1     2      2     4     8  
#> 3 cyl         32    32      3   4     4      6     8     8  
#> 4 gear        32    32      3   3     3      4     4     5  
#> 5 hp          32    32     22  52    96.5  123   180   335  
#> 6 mpg         32    32     25  10.4  15.4   19.2  22.8  33.9

# Stratify by 'gear':
mtcars %>%
  tsummary(mpg, hp, carb, by = gear)
#> # A tibble: 9 × 13
#> # Groups:   variable [3]
#>   variable  gear  rows   obs distin   min   q25 median   q75   max   mean     sd
#>   <chr>    <dbl> <int> <int>  <int> <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl>
#> 1 carb         3    15    15      4   1     2      3     4     4     2.67   1.18
#> 2 carb         4    12    12      3   1     1      2     4     4     2.33   1.30
#> 3 carb         5     5     5      4   2     2      4     6     8     4.4    2.61
#> 4 hp           3    15    15     10  97   150    180   210   245   176.    47.7 
#> 5 hp           4    12    12      9  52    65.8   94   110   123    89.5   25.9 
#> 6 hp           5     5     5      5  91   113    175   264   335   196.   103.  
#> 7 mpg          3    15    15     13  10.4  14.5   15.5  18.4  21.5  16.1    3.37
#> 8 mpg          4    12    12     10  17.8  21     22.8  28.1  33.9  24.5    5.28
#> 9 mpg          5     5     5      5  15    15.8   19.7  26    30.4  21.4    6.66
#> # ℹ 1 more variable: sum <dbl>

# Stratify by 'gear' and 'am':
mtcars %>%
  tsummary(mpg, hp, carb, by = c(am, gear))
#> # A tibble: 12 × 14
#> # Groups:   variable, am [6]
#>    variable    am  gear  rows   obs distin   min   q25 median   q75   max   mean
#>    <chr>    <dbl> <dbl> <int> <int>  <int> <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>
#>  1 carb         0     3    15    15      4   1     2      3     4     4     2.67
#>  2 carb         0     4     4     4      2   2     2      3     4     4     3   
#>  3 carb         1     4     8     8      3   1     1      1.5   2.5   4     2   
#>  4 carb         1     5     5     5      4   2     2      4     6     8     4.4 
#>  5 hp           0     3    15    15     10  97   150    180   210   245   176.  
#>  6 hp           0     4     4     4      3  62    86.8  109   123   123   101.  
#>  7 hp           1     4     8     8      6  52    65.8   79.5 109.  110    83.9 
#>  8 hp           1     5     5     5      5  91   113    175   264   335   196.  
#>  9 mpg          0     3    15    15     13  10.4  14.5   15.5  18.4  21.5  16.1 
#> 10 mpg          0     4     4     4      4  17.8  18.8   21    23.2  24.4  21.0 
#> 11 mpg          1     4     8     8      7  21    21.3   25.0  30.9  33.9  26.3 
#> 12 mpg          1     5     5     5      5  15    15.8   19.7  26    30.4  21.4 
#> # ℹ 2 more variables: sd <dbl>, sum <dbl>