Generates data according to all provided
constellations in data_tibble
and applies
all provided constellations in proc_tibble
to them.
eval_tibbles( data_grid, proc_grid = expand_tibble(proc = "length"), replications = 1, discard_generated_data = FALSE, post_analyze = identity, summary_fun = NULL, group_for_summary = NULL, ncpus = 1L, cluster = NULL, cluster_seed = rep(12345, 6), cluster_libraries = NULL, cluster_global_objects = NULL, envir = globalenv(), simplify = TRUE )
data_grid | a |
---|---|
proc_grid | similar as |
replications | number of replications for the simulation |
discard_generated_data | if |
post_analyze | this is a convenience function, that is applied
directly after the data analyzing function. If this function has an
argument |
summary_fun | named list of univariate function to summarize the results (numeric or logical) over the replications, e.g. list(mean = mean, sd = sd). |
group_for_summary | if the result returned by the data analyzing
function or |
ncpus | a cluster of |
cluster | a cluster generated by the |
cluster_seed | if the simulation is done in parallel
manner, then the combined multiple-recursive generator from L'Ecuyer (1999)
is used to generate random numbers. Thus |
cluster_libraries | a character vector specifying the packages that should be loaded by the workers. |
cluster_global_objects | a character vector specifying the names of R objects in the global environment that should be exported to the global environment of every worker. |
envir | must be provided if the functions specified
in |
simplify | usually the result column is nested, by default it is tried to unnest it. |
The returned object list of the class
eval_tibbles
, where the element simulations
contain
the results of the simulation.
If cluster
is provided by the user the
function eval_tibbles
will NOT stop the cluster.
This has to be done by the user. Conducting parallel
simulations by specifying ncpus
will internally
create a cluster and stop it after the simulation
is done.
Marsel Scheer
rng <- function(data, ...) { ret <- range(data) names(ret) <- c("min", "max") ret } ### The following line is only necessary ### if the examples are not executed in the global ### environment, which for instance is the case when ### the oneline-documentation ### http://marselscheer.github.io/simTool/reference/eval_tibbles.html ### is build. In such case eval_tibble() would search the ### above defined function rng() in the global environment where ### it does not exist! eval_tibbles <- purrr::partial(eval_tibbles, envir = environment()) dg <- expand_tibble(fun = "rnorm", n = c(5L, 10L)) pg <- expand_tibble(proc = c("rng", "median", "length")) eval_tibbles(dg, pg, rep = 2, simplify = FALSE)#> # A tibble: 12 × 5 #> fun n replications proc results #> <chr> <int> <int> <chr> <list> #> 1 rnorm 5 1 rng <dbl [2]> #> 2 rnorm 5 1 median <dbl [1]> #> 3 rnorm 5 1 length <int [1]> #> 4 rnorm 5 2 rng <dbl [2]> #> 5 rnorm 5 2 median <dbl [1]> #> 6 rnorm 5 2 length <int [1]> #> 7 rnorm 10 1 rng <dbl [2]> #> 8 rnorm 10 1 median <dbl [1]> #> 9 rnorm 10 1 length <int [1]> #> 10 rnorm 10 2 rng <dbl [2]> #> 11 rnorm 10 2 median <dbl [1]> #> 12 rnorm 10 2 length <int [1]> #> Number of data generating functions: 2 #> Number of analyzing procedures: 3 #> Number of replications: 2 #> Estimated replications per hour: 8858606 #> Start of the simulation: 2021-09-06 18:48:45 #> End of the simulation: 2021-09-06 18:48:45eval_tibbles(dg, pg, rep = 2)#> # A tibble: 16 × 5 #> fun n replications proc results #> <chr> <int> <int> <chr> <dbl> #> 1 rnorm 5 1 rng 0.112 #> 2 rnorm 5 1 rng 1.62 #> 3 rnorm 5 1 median 0.244 #> 4 rnorm 5 1 length 5 #> 5 rnorm 5 2 rng -1.91 #> 6 rnorm 5 2 rng 1.07 #> 7 rnorm 5 2 median -0.279 #> 8 rnorm 5 2 length 5 #> 9 rnorm 10 1 rng -1.91 #> 10 rnorm 10 1 rng 2.76 #> 11 rnorm 10 1 median 0.0583 #> 12 rnorm 10 1 length 10 #> 13 rnorm 10 2 rng -2.27 #> 14 rnorm 10 2 rng 2.68 #> 15 rnorm 10 2 median 0.0244 #> 16 rnorm 10 2 length 10 #> Number of data generating functions: 2 #> Number of analyzing procedures: 3 #> Number of replications: 2 #> Estimated replications per hour: 16314958 #> Start of the simulation: 2021-09-06 18:48:45 #> End of the simulation: 2021-09-06 18:48:45#> # A tibble: 12 × 7 #> fun n replications proc min max V1 #> <chr> <int> <int> <chr> <dbl> <dbl> <dbl> #> 1 rnorm 5 1 rng -1.18 1.11 NA #> 2 rnorm 5 1 median NA NA -0.246 #> 3 rnorm 5 1 length NA NA 5 #> 4 rnorm 5 2 rng -1.70 1.07 NA #> 5 rnorm 5 2 median NA NA 0.132 #> 6 rnorm 5 2 length NA NA 5 #> 7 rnorm 10 1 rng -1.47 1.34 NA #> 8 rnorm 10 1 median NA NA 0.260 #> 9 rnorm 10 1 length NA NA 10 #> 10 rnorm 10 2 rng -2.61 1.92 NA #> 11 rnorm 10 2 median NA NA 0.495 #> 12 rnorm 10 2 length NA NA 10 #> Number of data generating functions: 2 #> Number of analyzing procedures: 3 #> Number of replications: 2 #> Estimated replications per hour: 841455 #> Start of the simulation: 2021-09-06 18:48:45 #> End of the simulation: 2021-09-06 18:48:45#> # A tibble: 12 × 8 #> fun n replications summary_fun proc min max value #> <chr> <int> <int> <chr> <chr> <dbl> <dbl> <dbl> #> 1 rnorm 5 1 mean rng -0.196 1.37 NA #> 2 rnorm 5 1 mean median NA NA 0.716 #> 3 rnorm 5 1 mean length NA NA 5 #> 4 rnorm 5 1 sd rng 0.224 0.431 NA #> 5 rnorm 5 1 sd median NA NA 0.325 #> 6 rnorm 5 1 sd length NA NA 0 #> 7 rnorm 10 1 mean rng -1.72 1.55 NA #> 8 rnorm 10 1 mean median NA NA -0.185 #> 9 rnorm 10 1 mean length NA NA 10 #> 10 rnorm 10 1 sd rng 0.0509 0.812 NA #> 11 rnorm 10 1 sd median NA NA 0.621 #> 12 rnorm 10 1 sd length NA NA 0 #> Number of data generating functions: 2 #> Number of analyzing procedures: 3 #> Number of replications: 2 #> Estimated replications per hour: 99252 #> Start of the simulation: 2021-09-06 18:48:45 #> End of the simulation: 2021-09-06 18:48:45regData <- function(n, SD) { data.frame( x = seq(0, 1, length = n), y = rnorm(n, sd = SD) ) } eg <- eval_tibbles( expand_tibble(fun = "regData", n = 5L, SD = 1:2), expand_tibble(proc = "lm", formula = c("y~x", "y~I(x^2)")), replications = 3 ) eg#> # A tibble: 12 × 7 #> fun n SD replications proc formula results #> <chr> <int> <int> <int> <chr> <chr> <list> #> 1 regData 5 1 1 lm y~x <lm> #> 2 regData 5 1 1 lm y~I(x^2) <lm> #> 3 regData 5 1 2 lm y~x <lm> #> 4 regData 5 1 2 lm y~I(x^2) <lm> #> 5 regData 5 1 3 lm y~x <lm> #> 6 regData 5 1 3 lm y~I(x^2) <lm> #> 7 regData 5 2 1 lm y~x <lm> #> 8 regData 5 2 1 lm y~I(x^2) <lm> #> 9 regData 5 2 2 lm y~x <lm> #> 10 regData 5 2 2 lm y~I(x^2) <lm> #> 11 regData 5 2 3 lm y~x <lm> #> 12 regData 5 2 3 lm y~I(x^2) <lm> #> Number of data generating functions: 2 #> Number of analyzing procedures: 2 #> Number of replications: 3 #> Estimated replications per hour: 374120 #> Start of the simulation: 2021-09-06 18:48:45 #> End of the simulation: 2021-09-06 18:48:46presever_rownames <- function(mat) { rn <- rownames(mat) ret <- tibble::as_tibble(mat) ret$term <- rn ret } eg <- eval_tibbles( expand_tibble(fun = "regData", n = 5L, SD = 1:2), expand_tibble(proc = "lm", formula = c("y~x", "y~I(x^2)")), post_analyze = purrr::compose(presever_rownames, coef, summary), # post_analyze = broom::tidy, # is a nice out of the box alternative summary_fun = list(mean = mean, sd = sd), group_for_summary = "term", replications = 3 )#> Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.eg$simulation#> # A tibble: 16 × 12 #> fun n SD replications summary_fun proc formula term Estimate #> <chr> <int> <int> <int> <chr> <chr> <chr> <chr> <dbl> #> 1 regData 5 1 1 mean lm y~x (Intercept) -0.137 #> 2 regData 5 1 1 mean lm y~x x 0.148 #> 3 regData 5 1 1 mean lm y~I(x^2) (Intercept) -0.121 #> 4 regData 5 1 1 mean lm y~I(x^2) I(x^2) 0.154 #> 5 regData 5 1 1 sd lm y~x (Intercept) 0.330 #> 6 regData 5 1 1 sd lm y~x x 0.298 #> 7 regData 5 1 1 sd lm y~I(x^2) (Intercept) 0.240 #> 8 regData 5 1 1 sd lm y~I(x^2) I(x^2) 0.913 #> 9 regData 5 2 1 mean lm y~x (Intercept) -1.05 #> 10 regData 5 2 1 mean lm y~x x 2.58 #> 11 regData 5 2 1 mean lm y~I(x^2) (Intercept) -0.851 #> 12 regData 5 2 1 mean lm y~I(x^2) I(x^2) 2.91 #> 13 regData 5 2 1 sd lm y~x (Intercept) 0.754 #> 14 regData 5 2 1 sd lm y~x x 0.492 #> 15 regData 5 2 1 sd lm y~I(x^2) (Intercept) 0.667 #> 16 regData 5 2 1 sd lm y~I(x^2) I(x^2) 0.655 #> # … with 3 more variables: Std. Error <dbl>, t value <dbl>, Pr(>|t|) <dbl>dg <- expand_tibble(fun = "rexp", rate = c(10, 100), n = c(50L, 100L)) pg <- expand_tibble(proc = c("t.test"), conf.level = c(0.8, 0.9, 0.95)) et <- eval_tibbles(dg, pg, ncpus = 1, replications = 10^1, post_analyze = function(ttest, .truth) { mu <- 1 / .truth$rate ttest$conf.int[1] <= mu && mu <= ttest$conf.int[2] }, summary_fun = list(mean = mean, sd = sd) ) et#> # A tibble: 24 × 8 #> fun rate n replications summary_fun proc conf.level value #> <chr> <dbl> <int> <int> <chr> <chr> <dbl> <dbl> #> 1 rexp 10 50 1 mean t.test 0.8 0.9 #> 2 rexp 10 50 1 mean t.test 0.9 0.9 #> 3 rexp 10 50 1 mean t.test 0.95 0.9 #> 4 rexp 10 50 1 sd t.test 0.8 0.316 #> 5 rexp 10 50 1 sd t.test 0.9 0.316 #> 6 rexp 10 50 1 sd t.test 0.95 0.316 #> 7 rexp 100 50 1 mean t.test 0.8 0.6 #> 8 rexp 100 50 1 mean t.test 0.9 0.7 #> 9 rexp 100 50 1 mean t.test 0.95 0.7 #> 10 rexp 100 50 1 sd t.test 0.8 0.516 #> # … with 14 more rows #> Number of data generating functions: 4 #> Number of analyzing procedures: 3 #> Number of replications: 10 #> Estimated replications per hour: 216361 #> Start of the simulation: 2021-09-06 18:48:46 #> End of the simulation: 2021-09-06 18:48:46dg <- dplyr::bind_rows( expand_tibble(fun = "rexp", rate = 10, .truth = 1 / 10, n = c(50L, 100L)), expand_tibble(fun = "rnorm", .truth = 0, n = c(50L, 100L)) ) pg <- expand_tibble(proc = c("t.test"), conf.level = c(0.8, 0.9, 0.95)) et <- eval_tibbles(dg, pg, ncpus = 1, replications = 10^1, post_analyze = function(ttest, .truth) { ttest$conf.int[1] <= .truth && .truth <= ttest$conf.int[2] }, summary_fun = list(mean = mean, sd = sd) ) et#> # A tibble: 24 × 9 #> fun rate .truth n replications summary_fun proc conf.level value #> <chr> <dbl> <dbl> <int> <int> <chr> <chr> <dbl> <dbl> #> 1 rexp 10 0.1 50 1 mean t.test 0.8 0.9 #> 2 rexp 10 0.1 50 1 mean t.test 0.9 0.9 #> 3 rexp 10 0.1 50 1 mean t.test 0.95 1 #> 4 rexp 10 0.1 50 1 sd t.test 0.8 0.316 #> 5 rexp 10 0.1 50 1 sd t.test 0.9 0.316 #> 6 rexp 10 0.1 50 1 sd t.test 0.95 0 #> 7 rexp 10 0.1 100 1 mean t.test 0.8 0.6 #> 8 rexp 10 0.1 100 1 mean t.test 0.9 0.7 #> 9 rexp 10 0.1 100 1 mean t.test 0.95 0.8 #> 10 rexp 10 0.1 100 1 sd t.test 0.8 0.516 #> # … with 14 more rows #> Number of data generating functions: 4 #> Number of analyzing procedures: 3 #> Number of replications: 10 #> Estimated replications per hour: 204119 #> Start of the simulation: 2021-09-06 18:48:46 #> End of the simulation: 2021-09-06 18:48:46### need to remove the locally adapted eval_tibbles() ### otherwise executing the examples would mask ### eval_tibbles from simTool-namespace. rm(eval_tibbles)