reproducible-research

Does setting the seed in tf.random.set_seed also set the seed used by the glorot_uniform kernel_initializer when using a conv2D layer in keras?

点点圈 提交于 2020-05-15 02:58:45
问题 I'm currently training a convolutional neural network using a conv2D layer defined like this: conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), padding='SAME', activation='relu')(inputs) My understanding is that the default kernel_initializer is glorot_uniform which has a default seed of 'none': tf.keras.layers.Conv2D( filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform',

How can one use Binder (mybinder.org) with private Github repositories?

匆匆过客 提交于 2020-03-17 11:24:30
问题 After reviewing this exact issue (https://github.com/jupyterhub/binderhub/issues/237) it seems that the functionality for this has been implemented with this merged pull request (https://github.com/jupyterhub/binderhub/pull/671). However I can not seem to find guidance in the docs or elsewhere which explains what should go into the secrets.yml file or if there are other steps required in order to use Binder with private Github repos (Apologies if I have missed the obvious -- complete Binder

Set seed parallel random forest in caret for reproducible result

本小妞迷上赌 提交于 2020-01-13 19:57:05
问题 I wish to run random forest in parallel using caret package, and I wish to set the seeds for reproducible result as in Fully reproducible parallel models using caret. However, I don't understand line 9 in the following code taken from caret help: why do we sample 22 (plus the last model in line 12, 23) integer numbers (12 values for parameter k are evaluated)? For information, I wish to run 5-fold CV to evaluate 584 values for RF parameter 'mtry'. Any help is much appreciated. Thank you. ##

Set seed parallel random forest in caret for reproducible result

狂风中的少年 提交于 2020-01-13 19:56:45
问题 I wish to run random forest in parallel using caret package, and I wish to set the seeds for reproducible result as in Fully reproducible parallel models using caret. However, I don't understand line 9 in the following code taken from caret help: why do we sample 22 (plus the last model in line 12, 23) integer numbers (12 values for parameter k are evaluated)? For information, I wish to run 5-fold CV to evaluate 584 values for RF parameter 'mtry'. Any help is much appreciated. Thank you. ##

knitr templating - Dynamic chunks issue

谁说胖子不能爱 提交于 2020-01-06 14:40:39
问题 The following code is a very simplified MRE for an issue I'm experiencing. I'm trying to avoid R template packages, such as brew , and only use knit_expand() to achieve my goals. The issue is twofold: generated chunks don't get parsed (this is not happening in my real code, but happens in MRE) instead of LaTeX \includegraphics , knitr (or rmarkdown , or pandoc ) generates RMarkdown syntax for inserting figures ( ![] ). In regard to the former, I have a feeling that it might be related to my

Getting a command to re-create an object that is simpler than with dput()

你离开我真会死。 提交于 2020-01-03 06:06:09
问题 The output of dput is usually much more complex than what a user would have typed to create the same object. I understand that this might be necessary to guarantee a 100% reproducibility (including for instance when different users use different default settings). However, it doesn't make examples as readable as they could be, and I often spend some time simplifying the output. As an examples, consider: dput(data.frame(a=1:10)) > structure(list(a = 1:10), .Names = "a", row.names = c(NA, -10L)

Reproducible splitting of data into training and testing in R

本小妞迷上赌 提交于 2020-01-01 22:15:12
问题 A common way for sampling/splitting data in R is using sample , e.g., on row numbers. For example: require(data.table) set.seed(1) population <- as.character(1e5:(1e6-1)) # some made up ID names N <- 1e4 # sample size sample1 <- data.table(id = sort(sample(population, N))) # randomly sample N ids test <- sample(N-1, N/2, replace = F) test1 <- sample1[test, .(id)] The problem is that this isn't very robust to changes in the data. For example if we drop just one observation: sample2 <- sample1[

If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters?

纵饮孤独 提交于 2019-12-31 04:58:08
问题 UPDATE: This question was for Tensorflow 1.x. I upgraded to 2.0 and (at least on the simple code below) the reproducibility issue seems fixed on 2.0. So that solves my problem; but I'm still curious about what "best practices" were used for this issue on 1.x. Training the exact same model/parameters/data on keras/tensorflow does not give reproducible results and the loss is significantly different each time you train the model. There are many stackoverflow questions about that (eg, How to get

Limiting size of hierarchical data for reproducible example

南笙酒味 提交于 2019-12-23 12:46:23
问题 I am trying to come up with reproducible example (RE) for this question: Errors related to data frame columns during merging. To be qualified as having a RE, the question lacks only reproducible data . However, when I tried to use pretty much standard approach of dput(head(myDataObj)) , the output produced is 14MB size file. The problem is that my data object is a list of data frames , so head() limitation doesn't appear to work recursively . I haven't found any options for dput() and head()

Why does stacking CNN wreck reproducibility (even with seed & CPU)?

只谈情不闲聊 提交于 2019-12-22 09:49:18
问题 REPRODUCIBLE : ipt = Input(batch_shape=batch_shape) x = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(ipt) x = Flatten()(x) out = Dense(6, activation='softmax')(x) NOT REPRODUCIBLE : ipt = Input(batch_shape=batch_shape) x = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(ipt) x = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(x) x = Flatten()(x) out = Dense(6, activation='softmax')(x) The difference amplifies substantially when using a larger model, and actual data instead