One check is that no rows were duplicated during merge and the other check is that no columns were duplicated during merge.

sc_left_join(joined, left, right, by, ..., find_nonunique_key = TRUE)

Arguments

joined

the result of the left-join

left

the left table used in the left-join

right

the right table used in the left-join

by

the variables used for the left-join

...

further parameters that are passed to add_sanity_check.

find_nonunique_key

if TRUE a sanity-check is performed that finds keys (defined by by) that are non-unique. However this can be a time-consuming step.

Value

list with two elements for the two sanity checks performed by this function. The structure of each element is as the return object of add_sanity_check.

Examples

ab <- data.table::data.table(a = 1:4, b = letters[1:4]) abc <- data.table::data.table(a = c(1:4, 2), b = letters[1:5], c = rnorm(5)) j <- merge(x = ab, y = abc, by = "a") dummy_call <- function() { sc_left_join(joined = j, left = ab, right = abc, by = "a", description = "Left join outcome to main population") } dummy_call()
#> [[1]] #> [[1]]$entry_sanity_table #> description #> 1: Left join outcome to main population #> additional_desc data_name n n_fail n_na #> 1: nrow(joined table) = 5 equals nrow(left table) = 4 j, ab, abc 1 1 0 #> counter_meas fail_vec_str param_name call #> 1: - n_joined != n_left Merge-vars: 'a' dummy_call() #> #> [[1]]$fail_vec #> [1] TRUE #> #> [[1]]$fail #> [1] TRUE #> #> #> [[2]] #> [[2]]$entry_sanity_table #> description #> 1: Left join outcome to main population #> additional_desc data_name n n_fail n_na #> 1: No columns were duplicated by the left join j, ab, abc 1 1 0 #> counter_meas fail_vec_str param_name call #> 1: - length(duplicated_columns) > 0 Merge-vars: 'a' dummy_call() #> example #> 1: <data.table[1x1]> #> #> [[2]]$fail_vec #> [1] TRUE #> #> [[2]]$fail #> [1] TRUE #> #>
#> description #> 1: bmi above 15 #> 2: bmi below 30 #> 3: bmi above 15 #> 4: - #> 5: - #> 6: - #> 7: - #> 8: Measurements are expected to be bounded from below #> 9: Measurements are expected to be bounded from below #> 10: No NAs expected in iris #> 11: No NAs expected in iris #> 12: No NAs expected in iris #> 13: No NAs expected in iris #> 14: No NAs expected in iris #> 15: Measurements are expected to be positive #> 16: Measurements are expected to be positive #> 17: - #> 18: Left join outcome to main population #> 19: Left join outcome to main population #> 20: Left join outcome to main population #> additional_desc #> 1: - #> 2: - #> 3: - #> 4: Elements in 'type' should contain only 'b', 'c', 'd'. #> 5: Elements in 'type' should contain only 'b', 'c', 'd'. #> 6: Elements in 'Sepal.Length' should be in [1, 7.9). #> 7: Elements in 'Petal.Length' should be in [1, 7.9). #> 8: Elements in 'a' should be in (0.2, Inf). #> 9: Elements in 'b' should be in (0.2, Inf). #> 10: Check that column 'Sepal.Length' does not contain NA #> 11: Check that column 'Sepal.Width' does not contain NA #> 12: Check that column 'Petal.Length' does not contain NA #> 13: Check that column 'Petal.Width' does not contain NA #> 14: Check that column 'Species' does not contain NA #> 15: Elements in 'a' should be in (0, Inf). #> 16: Elements in 'b' should be in (0, Inf). #> 17: The combination of 'Species', 'Sepal.Length', 'Sepal.Width', 'Petal.Length' is unique #> 18: The combination of 'a' is unique #> 19: nrow(joined table) = 5 equals nrow(left table) = 4 #> 20: No columns were duplicated by the left join #> data_name n n_fail n_na counter_meas #> 1: x 4 1 0 none #> 2: 4 1 0 none #> 3: 4 1 0 - #> 4: d 4 1 0 - #> 5: d 4 1 0 - #> 6: iris 150 1 0 - #> 7: iris 150 0 0 - #> 8: d 4 3 0 - #> 9: d 4 0 0 - #> 10: iris 150 5 0 - #> 11: iris 150 0 0 - #> 12: iris 150 0 0 - #> 13: iris 150 0 0 - #> 14: iris 150 0 0 - #> 15: d 4 2 0 - #> 16: d 4 0 0 - #> 17: x 150 12 0 - #> 18: j, ab, abc 5 2 0 - #> 19: j, ab, abc 1 1 0 - #> 20: j, ab, abc 1 1 0 - #> fail_vec_str #> 1: x$bmi < 15 #> 2: x$bmi > 30 #> 3: d$bmi < 15 #> 4: !(object[[col]] %in% feasible_elements) #> 5: !(object[[col]] %in% feasible_elements) #> 6: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule)) #> 7: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule)) #> 8: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule)) #> 9: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule)) #> 10: is.na(object[[col]]) #> 11: is.na(object[[col]]) #> 12: is.na(object[[col]]) #> 13: is.na(object[[col]]) #> 14: is.na(object[[col]]) #> 15: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule)) #> 16: sapply(object[[col]], function(x) !checkmate::qtest(x = x, rules = rule)) #> 17: dt$.n_col_cmb != 1 #> 18: dt$.n_col_cmb != 1 #> 19: n_joined != n_left #> 20: length(duplicated_columns) > 0 #> param_name #> 1: bmi #> 2: - #> 3: - #> 4: type #> 5: type #> 6: Sepal.Length #> 7: Petal.Length #> 8: a #> 9: b #> 10: Sepal.Length #> 11: Sepal.Width #> 12: Petal.Length #> 13: Petal.Width #> 14: Species #> 15: a #> 16: b #> 17: 'Species', 'Sepal.Length', 'Sepal.Width', 'Petal.Length' #> 18: Merge-vars: 'a' #> 19: Merge-vars: 'a' #> 20: Merge-vars: 'a' #> call example #> 1: dummy_call(x = d) <data.frame[1x3]> #> 2: dummy_call(x = d) #> 3: eval(expr, envir, enclos) #> 4: eval(expr, envir, enclos) <data.frame[1x2]> #> 5: dummy_call(x = d) <data.frame[1x2]> #> 6: dummy_call(x = d) <data.frame[1x5]> #> 7: dummy_call(x = d) #> 8: dummy_call(x = d) <data.frame[3x2]> #> 9: dummy_call(x = d) #> 10: dummy_call(x = iris) <data.frame[3x5]> #> 11: dummy_call(x = iris) #> 12: dummy_call(x = iris) #> 13: dummy_call(x = iris) #> 14: dummy_call(x = iris) #> 15: dummy_call(x = d) <data.frame[2x2]> #> 16: dummy_call(x = d) #> 17: dummy_call(x = iris) <data.table[3x6]> #> 18: dummy_call() <data.table[2x5]> #> 19: dummy_call() #> 20: dummy_call() <data.table[1x1]>