reproducible-research

knitr - error when importing python module

爱⌒轻易说出口 提交于 2019-12-19 05:15:06
问题 I am having trouble when running the python engine in knitr. I can import some modules but not others. For example I can import numpy but not pandas. {r, engine='python'} import pandas I get the error. Quitting from lines 50-51 (prepayment.Rmd) Error in (knit_engines$get(options$engine))(options) : Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: No module named pandas Calls: <Anonymous> ... process_group.block -> call_block -> block_exec -> in_dir ->

Fully reproducible parallel models using caret

北战南征 提交于 2019-12-17 07:02:59
问题 When I run 2 random forests in caret, I get the exact same results if I set a random seed: library(caret) library(doParallel) set.seed(42) myControl <- trainControl(method='cv', index=createFolds(iris$Species)) set.seed(42) model1 <- train(Species~., iris, method='rf', trControl=myControl) set.seed(42) model2 <- train(Species~., iris, method='rf', trControl=myControl) > all.equal(predict(model1, type='prob'), predict(model2, type='prob')) [1] TRUE However, if I register a parallel back-end to

creating reproducible example using reprex package in r where a local file is being read

喜夏-厌秋 提交于 2019-12-14 03:08:06
问题 I often use reprex::reprex to create reproducible examples of R code to get help from others to get rid of errors in my code. Usually, I create minimal examples using datasets like iris or mtcars and it works well. But I always fail to use reprex any time I need to use my own data since the problem is so specific and I can't rely on datasets from datasets library. In that case, I get the following error: # loading needed libraries library(ggplot2) library(cowplot) library(devtools) # reading

A Way in Knitr to Copy a Chunk?

走远了吗. 提交于 2019-12-13 13:16:07
问题 Knitr Mavens, Background : Using knitr to report a report with many embedded graphs. In the body of the report, all that's appropriate is the graph, not the code. For example: ```{r graph_XYZ_subset, echo = FALSE, message = TRUE, fig.cap = "Text that explains the graph"} graph.subset <- ggplot() + ... ``` This part works just fine. However, there is a need to display the key parts of the code ( e.g., key statistical analyses and key graph generations)...but in an Addendum . Which leads to

What are common sources of randomness in Machine Learning projects with Keras?

跟風遠走 提交于 2019-12-13 08:33:38
问题 Reproducibility is important. In a closed-source machine learning project I'm currently working on it is hard to achieve it. What are the parts to look at? 回答1: Setting seeds Computers have pseudo-random number generators which are initialized with a value called the seed. For machine learning, you might need to do the following: # I've heard the order here is important import random random.seed(0) import numpy as np np.random.seed(0) import tensorflow as tf tf.set_random_seed(0) session_conf

Transitioning research project to knitr-based setup

别等时光非礼了梦想. 提交于 2019-12-13 03:03:17
问题 Finally, I've decided to move my dissertation research closer toward the goal of making it as good reproducible research as it can be, given my circumstances. Since currently I don't use LaTeX for my dissertation report (though I'm considering this option), I believe that knitr is the best way to go. The software project, implementing empirical part of my dissertation research ( data analysis ), is being written in R . The project's contains multiple files within directory structure , which

Why does mlr give different results in different runs even when using set.seed()?

血红的双手。 提交于 2019-12-12 00:14:12
问题 To publish reproducible results obtained in the mlr package one should use the set.seed() function to control the randomness of the code. Testing, it seems such practice doesn't lead to the desired results, in which different runs of the code give slightly different outputs, such as reported in the source of this question and following code. Here's some reproducible code ## libraries library(mlr) library(parallel) library(parallelMap) ## options set.seed(1) cv.n <- 3 bag.n <- 3 ## data

Simple TensorFlow computation not reproducible on different systems (macOS, Colab, Azure)

纵然是瞬间 提交于 2019-12-11 19:49:18
问题 I am investigating the reproducibility of code in TensorFlow on my macOS machine, on Google Colab, and on Azure with Docker. I understand that I can set a graph-level seed and an operation-level seed. I am using eager mode (so no parallelism optimization) and no GPUs. I use 100x100 random draws from the unit normal and calculate their mean and standard deviation. The test code below verifies that I am not using the GPU, that I am using Tensorflow 1.12.0 or the preview of TensorFlow 2, that

How to only install versions of packages that were made under a specific R release?

≡放荡痞女 提交于 2019-12-11 10:19:42
问题 I use the Revolution R Enterprise distribution that is built upon R 3.2.2. Hence, I have an interest in only employing package versions that are based on this R release as well. Checking packages like 'checkpoint' or the Revolution MRAN page, I only found ways to access snapshots of CRAN datewise. Is there a way to install the most recent package versions still compatible with a certain R release? 回答1: I found a heuristical solution to my own problem: Find out about the release date of the

Using knitr to produce complex dynamic documents

放肆的年华 提交于 2019-12-11 03:48:52
问题 The minimal reproducible example (RE) below is my attempt to figure out how can I use knitr for generating complex dynamic documents , where "complex" here refers not to the document's elements and their layout, but to non-linear logic of the underlying R code chunks. While the provided RE and its results show that a solution , based on such approach might work well, I would like to know : 1) is this a correct approach of using knitr for such situations; 2) are there any optimizations that