Loads, merges, reformats, corrects, and labels the REDCap file used for the MSK-IMPACT Prostate clinical database. It is recommended that the returned list is next processed by check_prostate_redcap.

load_prostate_redcap(
  labeled_csv,
  deidentify = TRUE,
  keep_also = list(baseline = NULL, sample = NULL, freeze = NULL)
)

Arguments

labeled_csv

CSV file with labels, exported from REDCap. Must be the labeled version and must contain dates in order to derive time intervals.

deidentify

De-identify the returned data set using deidentify_prostate_redcap? Defaults to TRUE. Should only be disabled if additional data need to be merged by identifiers, followed by calling deidentify_prostate_redcap separately.

keep_also

Optional. Additional patient-level variables to keep without editing. As applicable, they would need to be deidentified manually. Provide as list with vectors of variable names for baseline and freeze forms: list(baseline = c("var1", "var2"), freeze = "varX").

Value

List of three labeled tibbles (data frames):

  • pts: Patient-level data

  • smp: Sample-level data

  • trt: Treatment data

Access variables labels in RStudio via View

or using attr(., "label").

The warning message, Duplicated column names deduplicated, is expected due to the design of the REDCap dataset. Another warning message that a factor does not contain all levels is also possible.

Details

The following edits and assumptions are made:

  1. Potentially incomplete date variables are converted to date format, using guessdate.

  2. Various missingness indicators in strings and factors, c("Unknown / Not Reported", "N/A", "NA", "Unknown", "X", "x"), are converted to NA.

  3. "Undetectable" PSA is set to 0, PSA ">x" is set to x + 1, PSA "a-b" (e.g., 4.5-4.7) is set to the mean of the two values.

  4. Clinical T and N stage variables are set to missing if M1.

  5. Event dates and follow-up time for metastases (met_date), castration resistance (crpc_date), and death are set:

    • Event date is the last clinic visit (lastvisit) if a CRPC/metastases event has not occurred.

    • Event date is the last follow up/contact (lastfu) if last known survival status is alive.

    • If stage is M1 and the recorded metastasis date is no more than 1 month discrepant, met_date is set to the diagnosis date (dxdate).

    • If the sample is a variant histology (e.g., neuroendocrine), the castration resistance date (crpc_date) is the date of diagnosis and the event indicator for survival analyses (event_crpc) is NA.

    • Time intervals for these three survival outcomes are calculated from the time of sequencing. For late-entry survival models, time intervals from diagnosis to sequencing and from sample/biopsy to sequencing are also provided.

  6. Disease extent, distinguishing CRPC from castration-sensitive disease, at sampling is based on the sample date and the date of castration resistance. If the samples was obtained before the CRPC date, or CRPC did not occur, the sample is from castration-sensitive disease by definition.

See also

Examples

# Get path to toy data provided by the package:
example_csv_file <- system.file("extdata",
  "SampleGUPIMPACTDatab_DATA_LABELS_2021-05-26.csv",
  package = "prostateredcap",
  mustWork = TRUE)

# Load data:
pts_smp <- load_prostate_redcap(labeled_csv = example_csv_file)
#> Warning: There was 1 warning in `mutate()`.
#>  In argument: `stage_detailed = fct_relevel(...)`.
#> Caused by warning:
#> ! 5 unknown levels in `f`: T1/T2 NX M0, T3 N0 M0, T3 NX M0, T4 M0, and TX NX M0
#> Warning: Expected 4 pieces. Missing pieces filled with `NA` in 2 rows [3, 10].
#> Warning: There was 1 warning in `transmute()`.
#>  In argument: `smp_tissue = fct_collapse(smp_tissue, Visceral = c("Liver",
#>   "Lung"))`.
#> Caused by warning:
#> ! Unknown levels in `f`: Liver
#> Warning: There were 2 warnings in `mutate()`.
#> The first warning was:
#>  In argument: `dzextent_seq = fct_relevel(...)`.
#> Caused by warning:
#> ! 2 unknown levels in `f`: Regional nodes and Metastatic, variant histology
#>  Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> Warning: There was 1 warning in `mutate()`.
#>  In argument: `rx_dzextent = fct_relevel(...)`.
#> Caused by warning:
#> ! 6 unknown levels in `f`: Localized, Regional nodes, Metastatic
#> hormone-sensitive, Non-metastatic castration-resistant, Metastatic
#> castration-resistant, and Metastatic, variant histology

# Access patient-level data:
pts_smp$pts
#> # A tibble: 8 × 59
#>    ptid complete_pts age_dx race           race4 race3 ethnicity smoking smoke01
#>   <int> <chr>         <dbl> <fct>          <fct> <fct> <fct>     <fct>     <dbl>
#> 1     1 Complete       43.4 White          White White NOT Hisp… Never         0
#> 2     2 Complete       42.3 White          White White NOT Hisp… Never         0
#> 3     3 Complete       60.7 Black or Afri… Blac… Black NOT Hisp… Never         0
#> 4     4 Complete       NA   NA             NA    NA    NOT Hisp… Never         0
#> 5     5 Complete       63.4 White          White White NOT Hisp… Current       1
#> 6     6 Complete       49.1 White          White White NOT Hisp… Never         0
#> 7     7 Complete       59.6 Asian          Asian Asian NOT Hisp… Never         0
#> 8     8 Complete       65.2 White          White White NOT Hisp… Never         0
#> # ℹ 50 more variables: bx_gl_sum <dbl>, bx_gl <fct>, bx_gl34 <fct>,
#> #   bx_gl_maj <dbl>, bx_gl_min <dbl>, psa_dx <dbl>, psa_dxcat <fct>,
#> #   lnpsa_dx <dbl>, clin_t <fct>, clin_n <fct>, clin_m <fct>,
#> #   stage_detailed <fct>, stage <fct>, clin_tstage <fct>, clin_nstage <fct>,
#> #   mstage <fct>, rxprim <fct>, rxprim_oth <chr>, rxprim_rp <lgl>,
#> #   rxprim_adt <lgl>, rxprim_chemo <lgl>, rxprim_xrt <lgl>, rxprim_other <lgl>,
#> #   rp_gl_sum <dbl>, rp_gl34 <fct>, rp_gl_maj <dbl>, rp_gl_min <dbl>, …

# Access sample-level data:
pts_smp$smp
#> # A tibble: 12 × 41
#>     ptid complete_smp dmpid    hist_smp hist_cmt dzextent_smp dzextent2 ext_pros
#>    <int> <chr>        <chr>    <fct>    <chr>    <fct>        <fct>     <fct>   
#>  1     1 Complete     P-12345… Adenoca… NA       Metastatic … Metastat… FALSE   
#>  2     2 Complete     P-23456… Adenoca… NA       Metastatic … Metastat… TRUE    
#>  3     3 Complete     P-34567… Adenoca… NA       Localized    Localized FALSE   
#>  4     3 Complete     P-34567… Adenoca… NA       Metastatic … Metastat… FALSE   
#>  5     4 Complete     P-43219… Adenoca… NA       Metastatic … Metastat… FALSE   
#>  6     4 Complete     P-43219… Adenoca… NA       Metastatic … Metastat… FALSE   
#>  7     5 Complete     P-54321… Adenoca… NA       Metastatic … Metastat… TRUE    
#>  8     5 Complete     P-54321… Adenoca… NA       Metastatic … Metastat… TRUE    
#>  9     6 Complete     P-67895… Adenoca… NA       Metastatic … Metastat… FALSE   
#> 10     7 Complete     P-77320… Adenoca… NA       Regional no… Regional… FALSE   
#> 11     8 Complete     P-88321… Adenoca… NA       Metastatic … Metastat… TRUE    
#> 12     8 Complete     P-88321… Adenoca… NA       Metastatic … Metastat… FALSE   
#> # ℹ 33 more variables: ext_lndis <fct>, ext_bone <fct>, ext_vis <fct>,
#> #   ext_liver <fct>, ext_lung <fct>, ext_other <fct>, bonevol <fct>,
#> #   cntadt <fct>, tissue <fct>, smp_pros <fct>, smp_tissue <fct>,
#> #   pur_rev <fct>, pur_remov <fct>, is_met_for_qc <fct>, dzextent_seq <fct>,
#> #   primmet_smp <chr>, age_smp <dbl>, age_seq <dbl>, dx_smp_mos <dbl>,
#> #   adt_smp_mos <dbl>, dx_seq_mos <dbl>, adt_seq_mos <dbl>, smp_met_mos <dbl>,
#> #   smp_os_mos <dbl>, seq_met_mos <dbl>, seq_crpc_mos <dbl>, …

# Access treatment data:
pts_smp$trt
#> # A tibble: 0 × 19
#> # ℹ 19 variables: ptid <int>, rx_line <chr>, rx_name <fct>,
#> #   rx_name_parpi <chr>, rx_censor <chr>, rx_stop_reason <fct>,
#> #   rx_stop_reason_other <chr>, rx_dzextent <fct>, rx_ext_pros <fct>,
#> #   rx_ext_lndis <fct>, rx_ext_bone <fct>, rx_ext_vis <fct>,
#> #   rx_ext_liver <fct>, rx_ext_lung <fct>, rx_ext_other <fct>,
#> #   rx_bonevol <fct>, dx_rx_start_mos <dbl>, dx_rx_end_mos <dbl>, rx_wks <dbl>

# Pass 'pts_smp' to check_prostate_redcap() next