backpropagation | 易学教程

Matrix dimensions not matching in back propagation

阅读更多关于 Matrix dimensions not matching in back propagation

问题 Here I'm attempting to implement a neural network with a single hidden layer to classify two training examples. This network utilizes the sigmoid activation function. The layers dimensions and weights are as follows : X : 2X4 w1 : 2X3 l1 : 4X3 w2 : 2X4 Y : 2X3 I'm experiencing an issue in back propagation where the matrix dimensions are not correct. This code : import numpy as np M = 2 learning_rate = 0.0001 X_train = np.asarray([[1,1,1,1] , [0,0,0,0]]) Y_train = np.asarray([[1,1,1] , [0,0,0]

When will the computation graph be freed if I only do forward for some samples?

阅读更多关于 When will the computation graph be freed if I only do forward for some samples?

问题 I have a use case where I do forward for each sample in a batch and only accumulate loss for some of the samples based on some condition on the model output of the sample. Here is an illustrating code, for batch_idx, (data, target) in enumerate(train_loader): optimizer.zero_grad() total_loss = 0 loss_count_local = 0 for i in range(len(target)): im = Variable(data[i].unsqueeze(0).cuda()) y = Variable(torch.FloatTensor([target[i]]).cuda()) out = model(im) # if out satisfy some condtion, we will

RNN: Back-propagation through time when output is taken only at final timestep

阅读更多关于 RNN: Back-propagation through time when output is taken only at final timestep

问题 In this blog on Recurrent Neural Networks by Denny Britz. Author states that, " The above diagram has outputs at each time step, but depending on the task this may not be necessary. For example, when predicting the sentiment of a sentence we may only care about the final output, not the sentiment after each word. Similarly, we may not need inputs at each time step. " In the case when we take output only at the final timestep: How will backpropogation change, if there are no outputs at each

How backpropagation works in Convolutional Neural Network(CNN)?

阅读更多关于 How backpropagation works in Convolutional Neural Network(CNN)?

问题 I have few question regarding CNN. In the figure below between Layer S2 and C3, 5*5 sized kernel has been used. Q1. How many kernel has been used there? Do each of these kernel connected with each of the feature map in Layer S2 ? Q2. When using Max-pooling, while backpropageting error how a max-pooling feature/neuron knows/determines from which (feature map/neuron) in its previous immediate layer it got the max value ? Q3. If we want to train kernel then we initialize with random value, is

Convolutional neural network not converging

阅读更多关于 Convolutional neural network not converging

问题 I've been watching some videos on deep learning/convolutional neural networks, like here and here, and I tried to implement my own in C++. I tried to keep the input data fairly simple for my first attempt so the idea is to differentiate between a cross and a circle, I have a small data set of around 25 of each (64*64 images), they look like this: The network itself is five layers: Convolution (5 filters, size 3, stride 1, with a ReLU) MaxPool (size 2) Convolution (1 filter, size 3, stride 1,

Artificial Neural Network RELU Activation Function and Gradients

阅读更多关于 Artificial Neural Network RELU Activation Function and Gradients

问题 I have a question. I watched a really detailed tutorial on implementing an artificial neural network in C++. And now I have more than a basic understanding of how a neural network works and how to actually program and train one. So in the tutorial a hyperbolic tangent was used for calculating outputs, and obviously its derivative for calculating gradients. However I wanted to move on to a different function. Specifically Leaky RELU (to avoid dying neurons). My question is, it specifies that

How is input dataset fed into neural network?

阅读更多关于 How is input dataset fed into neural network?

问题 If I have 1000 observations in my dataset with 15 features and 1 label, how is the data in input neurons fed for forward pass and back propagation? Is it fed row wise for 1000 observations (one at a time) and weights are updated with each observation fed or full data is given in terms of input matrix and then with number of epochs, the network learns corresponding weight values? Also if it is fed one at time, what is epochs in that case? Thanks 回答1: Assuming that the data is formatted into

Neural Network not fitting XOR

阅读更多关于 Neural Network not fitting XOR

问题 I created an Octave script for training a neural network with 1 hidden layer using backpropagation but it can not seem to fit an XOR function. x Input 4x2 matrix [0 0; 0 1; 1 0; 1 1] y Output 4x1 matrix [0; 1; 1; 0] theta Hidden / output layer weights z Weighted sums a Activation function applied to weighted sums m Sample count ( 4 here) My weights are initialized as follows epsilon_init = 0.12; theta1 = rand(hiddenCount, inputCount + 1) * 2 * epsilon_init * epsilon_init; theta2 = rand

How is a multiple-outputs deep learning model trained?

阅读更多关于 How is a multiple-outputs deep learning model trained?

问题 I think I do not understand the multiple-output networks. Althrough i understand how the implementation is made and i succesfully trained one model like this, i don't understand how a multiple-outputs deep learning network is trained. I mean, what is happening inside the network during training? Take for example this network from the keras functional api guide: You can see the two outputs (aux_output and main_output). How is the backpropagation working? My intuition was that the model does

Understanding when to use python list in Pytorch

阅读更多关于 Understanding when to use python list in Pytorch

问题 Basically as this thread discusses here, you cannot use python list to wrap your sub-modules (for example your layers); otherwise, Pytorch is not going to update the parameters of the sub-modules inside the list. Instead you should use nn.ModuleList to wrap your sub-modules to make sure their parameters are going to be updated. Now I have also seen codes like following where the author uses python list to calculate the loss and then do loss.backward() to do the update (in reinforce algorithm