reproducible-research | 易学教程

Does setting the seed in tf.random.set_seed also set the seed used by the glorot_uniform kernel_initializer when using a conv2D layer in keras?

阅读更多关于 Does setting the seed in tf.random.set_seed also set the seed used by the glorot_uniform kernel_initializer when using a conv2D layer in keras?

问题 I'm currently training a convolutional neural network using a conv2D layer defined like this: conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), padding='SAME', activation='relu')(inputs) My understanding is that the default kernel_initializer is glorot_uniform which has a default seed of 'none': tf.keras.layers.Conv2D( filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform',

How can one use Binder (mybinder.org) with private Github repositories?

阅读更多关于 How can one use Binder (mybinder.org) with private Github repositories?

问题 After reviewing this exact issue (https://github.com/jupyterhub/binderhub/issues/237) it seems that the functionality for this has been implemented with this merged pull request (https://github.com/jupyterhub/binderhub/pull/671). However I can not seem to find guidance in the docs or elsewhere which explains what should go into the secrets.yml file or if there are other steps required in order to use Binder with private Github repos (Apologies if I have missed the obvious -- complete Binder

Set seed parallel random forest in caret for reproducible result

阅读更多关于 Set seed parallel random forest in caret for reproducible result

问题 I wish to run random forest in parallel using caret package, and I wish to set the seeds for reproducible result as in Fully reproducible parallel models using caret. However, I don't understand line 9 in the following code taken from caret help: why do we sample 22 (plus the last model in line 12, 23) integer numbers (12 values for parameter k are evaluated)? For information, I wish to run 5-fold CV to evaluate 584 values for RF parameter 'mtry'. Any help is much appreciated. Thank you. ##

Set seed parallel random forest in caret for reproducible result

阅读更多关于 Set seed parallel random forest in caret for reproducible result

knitr templating - Dynamic chunks issue

阅读更多关于 knitr templating - Dynamic chunks issue

问题 The following code is a very simplified MRE for an issue I'm experiencing. I'm trying to avoid R template packages, such as brew , and only use knit_expand() to achieve my goals. The issue is twofold: generated chunks don't get parsed (this is not happening in my real code, but happens in MRE) instead of LaTeX \includegraphics , knitr (or rmarkdown , or pandoc ) generates RMarkdown syntax for inserting figures ( ![] ). In regard to the former, I have a feeling that it might be related to my

Getting a command to re-create an object that is simpler than with dput()

阅读更多关于 Getting a command to re-create an object that is simpler than with dput()

问题 The output of dput is usually much more complex than what a user would have typed to create the same object. I understand that this might be necessary to guarantee a 100% reproducibility (including for instance when different users use different default settings). However, it doesn't make examples as readable as they could be, and I often spend some time simplifying the output. As an examples, consider: dput(data.frame(a=1:10)) > structure(list(a = 1:10), .Names = "a", row.names = c(NA, -10L)

Reproducible splitting of data into training and testing in R

阅读更多关于 Reproducible splitting of data into training and testing in R

问题 A common way for sampling/splitting data in R is using sample , e.g., on row numbers. For example: require(data.table) set.seed(1) population <- as.character(1e5:(1e6-1)) # some made up ID names N <- 1e4 # sample size sample1 <- data.table(id = sort(sample(population, N))) # randomly sample N ids test <- sample(N-1, N/2, replace = F) test1 <- sample1[test, .(id)] The problem is that this isn't very robust to changes in the data. For example if we drop just one observation: sample2 <- sample1[

If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters?

阅读更多关于 If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters?

问题 UPDATE: This question was for Tensorflow 1.x. I upgraded to 2.0 and (at least on the simple code below) the reproducibility issue seems fixed on 2.0. So that solves my problem; but I'm still curious about what "best practices" were used for this issue on 1.x. Training the exact same model/parameters/data on keras/tensorflow does not give reproducible results and the loss is significantly different each time you train the model. There are many stackoverflow questions about that (eg, How to get

Limiting size of hierarchical data for reproducible example

阅读更多关于 Limiting size of hierarchical data for reproducible example

问题 I am trying to come up with reproducible example (RE) for this question: Errors related to data frame columns during merging. To be qualified as having a RE, the question lacks only reproducible data . However, when I tried to use pretty much standard approach of dput(head(myDataObj)) , the output produced is 14MB size file. The problem is that my data object is a list of data frames , so head() limitation doesn't appear to work recursively . I haven't found any options for dput() and head()

Why does stacking CNN wreck reproducibility (even with seed & CPU)?

阅读更多关于 Why does stacking CNN wreck reproducibility (even with seed & CPU)?

问题 REPRODUCIBLE : ipt = Input(batch_shape=batch_shape) x = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(ipt) x = Flatten()(x) out = Dense(6, activation='softmax')(x) NOT REPRODUCIBLE : ipt = Input(batch_shape=batch_shape) x = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(ipt) x = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(x) x = Flatten()(x) out = Dense(6, activation='softmax')(x) The difference amplifies substantially when using a larger model, and actual data instead