Performs various checks after a left-join was performed

One check is that no rows were duplicated during merge and the other check is that no columns were duplicated during merge.

sc_left_join(joined, left, right, by, ..., find_nonunique_key = TRUE)

Arguments

joined	the result of the left-join
left	the left table used in the left-join
right	the right table used in the left-join
by	the variables used for the left-join
...	further parameters that are passed to add_sanity_check.
find_nonunique_key	if TRUE a sanity-check is performed that finds keys (defined by `by`) that are non-unique. However this can be a time-consuming step.

Value

list with two elements for the two sanity checks performed by this function. The structure of each element is as the return object of add_sanity_check.

Examples

ab <- data.table::data.table(a = 1:4, b = letters[1:4])
abc <- data.table::data.table(a = c(1:4, 2), b = letters[1:5], c = rnorm(5))
j <- merge(x = ab, y = abc, by = "a")
dummy_call <- function() {
  sc_left_join(joined = j, left = ab, right = abc, by = "a",
    description = "Left join outcome to main population")
}
dummy_call()
#> [[1]]
#> [[1]]$entry_sanity_table
#>                             description
#> 1: Left join outcome to main population
#>                                       additional_desc  data_name n n_fail n_na
#> 1: nrow(joined table) = 5 equals nrow(left table) = 4 j, ab, abc 1      1    0
#>    counter_meas       fail_vec_str      param_name         call
#> 1:            - n_joined != n_left Merge-vars: 'a' dummy_call()
#> 
#> [[1]]$fail_vec
#> [1] TRUE
#> 
#> [[1]]$fail
#> [1] TRUE
#> 
#> 
#> [[2]]
#> [[2]]$entry_sanity_table
#>                             description
#> 1: Left join outcome to main population
#>                                additional_desc  data_name n n_fail n_na
#> 1: No columns were duplicated by the left join j, ab, abc 1      1    0
#>    counter_meas                   fail_vec_str      param_name         call
#> 1:            - length(duplicated_columns) > 0 Merge-vars: 'a' dummy_call()
#>              example
#> 1: <data.table[1x1]>
#> 
#> [[2]]$fail_vec
#> [1] TRUE
#> 
#> [[2]]$fail
#> [1] TRUE
#> 
#> 
get_sanity_checks()
#>                                            description
#>  1:                                       bmi above 15
#>  2:                                       bmi below 30
#>  3:                                       bmi above 15
#>  4:                                                  -
#>  5:                                                  -
#>  6:                                                  -
#>  7:                                                  -
#>  8: Measurements are expected to be bounded from below
#>  9: Measurements are expected to be bounded from below
#> 10:                            No NAs expected in iris
#> 11:                            No NAs expected in iris
#> 12:                            No NAs expected in iris
#> 13:                            No NAs expected in iris
#> 14:                            No NAs expected in iris
#> 15:           Measurements are expected to be positive
#> 16:           Measurements are expected to be positive
#> 17:                                                  -
#> 18:               Left join outcome to main population
#> 19:               Left join outcome to main population
#> 20:               Left join outcome to main population
#>                                                                           additional_desc
#>  1:                                                                                     -
#>  2:                                                                                     -
#>  3:                                                                                     -
#>  4:                                 Elements in 'type' should contain only 'b', 'c', 'd'.
#>  5:                                 Elements in 'type' should contain only 'b', 'c', 'd'.
#>  6:                                     Elements in 'Sepal.Length' should be in [1, 7.9).
#>  7:                                     Elements in 'Petal.Length' should be in [1, 7.9).
#>  8:                                              Elements in 'a' should be in (0.2, Inf).
#>  9:                                              Elements in 'b' should be in (0.2, Inf).
#> 10:                                  Check that column 'Sepal.Length' does not contain NA
#> 11:                                   Check that column 'Sepal.Width' does not contain NA
#> 12:                                  Check that column 'Petal.Length' does not contain NA
#> 13:                                   Check that column 'Petal.Width' does not contain NA
#> 14:                                       Check that column 'Species' does not contain NA
#> 15:                                                Elements in 'a' should be in (0, Inf).
#> 16:                                                Elements in 'b' should be in (0, Inf).
#> 17: The combination of 'Species', 'Sepal.Length', 'Sepal.Width', 'Petal.Length' is unique
#> 18:                                                      The combination of 'a' is unique
#> 19:                                    nrow(joined table) = 5 equals nrow(left table) = 4
#> 20:                                           No columns were duplicated by the left join
#>      data_name   n n_fail n_na counter_meas
#>  1:          x   4      1    0         none
#>  2:              4      1    0         none
#>  3:              4      1    0            -
#>  4:          d   4      1    0            -
#>  5:          d   4      1    0            -
#>  6:       iris 150      1    0            -
#>  7:       iris 150      0    0            -
#>  8:          d   4      3    0            -
#>  9:          d   4      0    0            -
#> 10:       iris 150      5    0            -
#> 11:       iris 150      0    0            -
#> 12:       iris 150      0    0            -
#> 13:       iris 150      0    0            -
#> 14:       iris 150      0    0            -
#> 15:          d   4      2    0            -
#> 16:          d   4      0    0            -
#> 17:          x 150     12    0            -
#> 18: j, ab, abc   5      2    0            -
#> 19: j, ab, abc   1      1    0            -
#> 20: j, ab, abc   1      1    0            -
#>                                                                  fail_vec_str
#>  1:                                                                x$bmi < 15
#>  2:                                                                x$bmi > 30
#>  3:                                                                d$bmi < 15
#>  4:                                   !(object[[col]] %in% feasible_elements)
#>  5:                                   !(object[[col]] %in% feasible_elements)
#>  6: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  7: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  8: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#>  9: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#> 10:                                                      is.na(object[[col]])
#> 11:                                                      is.na(object[[col]])
#> 12:                                                      is.na(object[[col]])
#> 13:                                                      is.na(object[[col]])
#> 14:                                                      is.na(object[[col]])
#> 15: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#> 16: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule))
#> 17:                                                        dt$.n_col_cmb != 1
#> 18:                                                        dt$.n_col_cmb != 1
#> 19:                                                        n_joined != n_left
#> 20:                                            length(duplicated_columns) > 0
#>                                                   param_name
#>  1:                                                      bmi
#>  2:                                                        -
#>  3:                                                        -
#>  4:                                                     type
#>  5:                                                     type
#>  6:                                             Sepal.Length
#>  7:                                             Petal.Length
#>  8:                                                        a
#>  9:                                                        b
#> 10:                                             Sepal.Length
#> 11:                                              Sepal.Width
#> 12:                                             Petal.Length
#> 13:                                              Petal.Width
#> 14:                                                  Species
#> 15:                                                        a
#> 16:                                                        b
#> 17: 'Species', 'Sepal.Length', 'Sepal.Width', 'Petal.Length'
#> 18:                                          Merge-vars: 'a'
#> 19:                                          Merge-vars: 'a'
#> 20:                                          Merge-vars: 'a'
#>                          call           example
#>  1:         dummy_call(x = d) <data.frame[1x3]>
#>  2:         dummy_call(x = d)                  
#>  3: eval(expr, envir, enclos)                  
#>  4: eval(expr, envir, enclos) <data.frame[1x2]>
#>  5:         dummy_call(x = d) <data.frame[1x2]>
#>  6:         dummy_call(x = d) <data.frame[1x5]>
#>  7:         dummy_call(x = d)                  
#>  8:         dummy_call(x = d) <data.frame[3x2]>
#>  9:         dummy_call(x = d)                  
#> 10:      dummy_call(x = iris) <data.frame[3x5]>
#> 11:      dummy_call(x = iris)                  
#> 12:      dummy_call(x = iris)                  
#> 13:      dummy_call(x = iris)                  
#> 14:      dummy_call(x = iris)                  
#> 15:         dummy_call(x = d) <data.frame[2x2]>
#> 16:         dummy_call(x = d)                  
#> 17:      dummy_call(x = iris) <data.table[3x6]>
#> 18:              dummy_call() <data.table[2x5]>
#> 19:              dummy_call()                  
#> 20:              dummy_call() <data.table[1x1]>