backpropagation | 易学教程

Neural Network learning rate and batch weight update

阅读更多关于 Neural Network learning rate and batch weight update

I have programmed a Neural Network in Java and am now working on the back-propagation algorithm. I've read that batch updates of the weights will cause a more stable gradient search instead of a online weight update. As a test I've created a time series function of 100 points, such that x = [0..99] and y = f(x) . I've created a Neural Network with one input and one output and 2 hidden layers with 10 neurons for testing. What I am struggling with is the learning rate of the back-propagation algorithm when tackling this problem. I have 100 input points so when I calculate the weight change dw_

How to use k-fold cross validation in a neural network

阅读更多关于 How to use k-fold cross validation in a neural network

We are writing a small ANN which is supposed to categorize 7000 products into 7 classes based on 10 input variables. In order to do this we have to use k-fold cross validation but we are kind of confused. We have this excerpt from the presentation slide: What are exactly the validation and test sets? From what we understand is that we run through the 3 training sets and adjust the weights (single epoch). Then what do we do with the validation? Because from what I understand is that the test set is used to get the error of the network. What happens next is also confusing to me. When does the

Understanding Neural Network Backpropagation

阅读更多关于 Understanding Neural Network Backpropagation

问题 Update: a better formulation of the issue. I'm trying to understand the backpropagation algorithm with an XOR neural network as an example. For this case there are 2 input neurons + 1 bias, 2 neurons in the hidden layer + 1 bias, and 1 output neuron. A B A XOR B 1 1 -1 1 -1 1 -1 1 1 -1 -1 -1 (source: wikimedia.org) I'm using stochastic backpropagation. After reading a bit more I have found out that the error of the output unit is propagated to the hidden layers... initially this was confusing

Training feedforward neural network for OCR [closed]

阅读更多关于 Training feedforward neural network for OCR [closed]

Currently I'm learning about neural networks and I'm trying to create an application that can be trained to recognize handwritten characters. For this problem I use a feed-forward neural network and it seems to work when I train it to recognize 1, 2 or 3 different characters. But when I try to make the network learn more than 3 characters it will stagnate at a error percentage around the 40 - 60%. I tried with multiple layers and less/more neurons but I can't seem to get it right, now I'm wondering if a feedforward neural network is capable of recognizing that much information. Some statistics

Custom macro for recall in keras

阅读更多关于 Custom macro for recall in keras

问题 I am trying to create a custom macro for recall = (recall of class1 + recall of class2)/2 . I came up with the following code but I am not sure how to calculate the true positive of class 0. def unweightedRecall(): def recall(y_true, y_pred): # recall of class 1 true_positives1 = K.sum(K.round(K.clip(y_pred * y_true, 0, 1))) possible_positives1 = K.sum(K.round(K.clip(y_true, 0, 1))) recall1 = true_positives1 / (possible_positives1 + K.epsilon()) # --- get true positive of class 0 in true

Difference on performance between numpy and matlab

阅读更多关于 Difference on performance between numpy and matlab

I am computing the backpropagation algorithm for a sparse autoencoder. I have implemented it in python using numpy and in matlab . The code is almost the same, but the performance is very different. The time matlab takes to complete the task is 0.252454 seconds while numpy 0.973672151566, that is almost four times more. I will call this code several times later in a minimization problem so this difference leads to several minutes of delay between the implementations. Is this a normal behaviour? How could I improve the performance in numpy? Numpy implementation: Sparse.rho is a tuning parameter

Why is the Cross Entropy method preferred over Mean Squared Error? In what cases does this doesn't hold up? [closed]

阅读更多关于 Why is the Cross Entropy method preferred over Mean Squared Error? In what cases does this doesn't hold up? [closed]

Although both of the above methods provide better score for better closeness of prediction, still cross-entropy is preferred. Is it in every cases or there are some peculiar scenarios where we prefer cross-entropy over MSE? Cross-entropy is prefered for classification , while mean squared error is one of the best choices for regression . This comes directly from the statement of the problems itself - in classification you work with very particular set of possible output values thus MSE is badly defined (as it does not have this kind of knowledge thus penalizes errors in incompatible way). To

Pytorch: How to create an update rule that doesn't come from derivatives?

阅读更多关于 Pytorch: How to create an update rule that doesn't come from derivatives?

问题 I want to implement the following algorithm, taken from this book, section 13.6: I don't understand how to implement the update rule in pytorch (the rule for w is quite similar to that of theta). As far as I know, torch requires a loss for loss.backwward() . This form does not seem to apply for the quoted algorithm. I'm still certain there is a correct way of implementing such update rules in pytorch. Would greatly appreciate a code snippet of how the w weights should be updated, given that V

How does the back-propagation algorithm deal with non-differentiable activation functions?

阅读更多关于 How does the back-propagation algorithm deal with non-differentiable activation functions?

问题 while digging through the topic of neural networks and how to efficiently train them I came across the method of using very simple activation functions, such as the recified linear unit (ReLU), instead of the classic smooth sigmoids. The ReLU-function is not differentiable at the origin, so according to my understanding the backpropagation algorithm (BPA) is not suitable for training a neural network with ReLUs, since the chain rule of multivariable calculus refers to smooth functions only.

Neural Network learning rate and batch weight update

阅读更多关于 Neural Network learning rate and batch weight update

问题 I have programmed a Neural Network in Java and am now working on the back-propagation algorithm. I've read that batch updates of the weights will cause a more stable gradient search instead of a online weight update. As a test I've created a time series function of 100 points, such that x = [0..99] and y = f(x) . I've created a Neural Network with one input and one output and 2 hidden layers with 10 neurons for testing. What I am struggling with is the learning rate of the back-propagation