backpropagation

Matrix dimensions not matching in back propagation

不羁的心 提交于 2019-12-12 15:30:23
问题 Here I'm attempting to implement a neural network with a single hidden layer to classify two training examples. This network utilizes the sigmoid activation function. The layers dimensions and weights are as follows : X : 2X4 w1 : 2X3 l1 : 4X3 w2 : 2X4 Y : 2X3 I'm experiencing an issue in back propagation where the matrix dimensions are not correct. This code : import numpy as np M = 2 learning_rate = 0.0001 X_train = np.asarray([[1,1,1,1] , [0,0,0,0]]) Y_train = np.asarray([[1,1,1] , [0,0,0]

When will the computation graph be freed if I only do forward for some samples?

拥有回忆 提交于 2019-12-12 14:19:52
问题 I have a use case where I do forward for each sample in a batch and only accumulate loss for some of the samples based on some condition on the model output of the sample. Here is an illustrating code, for batch_idx, (data, target) in enumerate(train_loader): optimizer.zero_grad() total_loss = 0 loss_count_local = 0 for i in range(len(target)): im = Variable(data[i].unsqueeze(0).cuda()) y = Variable(torch.FloatTensor([target[i]]).cuda()) out = model(im) # if out satisfy some condtion, we will

RNN: Back-propagation through time when output is taken only at final timestep

血红的双手。 提交于 2019-12-12 05:59:28
问题 In this blog on Recurrent Neural Networks by Denny Britz. Author states that, " The above diagram has outputs at each time step, but depending on the task this may not be necessary. For example, when predicting the sentiment of a sentence we may only care about the final output, not the sentiment after each word. Similarly, we may not need inputs at each time step. " In the case when we take output only at the final timestep: How will backpropogation change, if there are no outputs at each

How backpropagation works in Convolutional Neural Network(CNN)?

筅森魡賤 提交于 2019-12-12 04:07:53
问题 I have few question regarding CNN. In the figure below between Layer S2 and C3, 5*5 sized kernel has been used. Q1. How many kernel has been used there? Do each of these kernel connected with each of the feature map in Layer S2 ? Q2. When using Max-pooling, while backpropageting error how a max-pooling feature/neuron knows/determines from which (feature map/neuron) in its previous immediate layer it got the max value ? Q3. If we want to train kernel then we initialize with random value, is

Convolutional neural network not converging

╄→гoц情女王★ 提交于 2019-12-11 11:16:37
问题 I've been watching some videos on deep learning/convolutional neural networks, like here and here, and I tried to implement my own in C++. I tried to keep the input data fairly simple for my first attempt so the idea is to differentiate between a cross and a circle, I have a small data set of around 25 of each (64*64 images), they look like this: The network itself is five layers: Convolution (5 filters, size 3, stride 1, with a ReLU) MaxPool (size 2) Convolution (1 filter, size 3, stride 1,

Artificial Neural Network RELU Activation Function and Gradients

▼魔方 西西 提交于 2019-12-11 07:05:58
问题 I have a question. I watched a really detailed tutorial on implementing an artificial neural network in C++. And now I have more than a basic understanding of how a neural network works and how to actually program and train one. So in the tutorial a hyperbolic tangent was used for calculating outputs, and obviously its derivative for calculating gradients. However I wanted to move on to a different function. Specifically Leaky RELU (to avoid dying neurons). My question is, it specifies that

How is input dataset fed into neural network?

六月ゝ 毕业季﹏ 提交于 2019-12-11 00:47:25
问题 If I have 1000 observations in my dataset with 15 features and 1 label, how is the data in input neurons fed for forward pass and back propagation? Is it fed row wise for 1000 observations (one at a time) and weights are updated with each observation fed or full data is given in terms of input matrix and then with number of epochs, the network learns corresponding weight values? Also if it is fed one at time, what is epochs in that case? Thanks 回答1: Assuming that the data is formatted into

Neural Network not fitting XOR

寵の児 提交于 2019-12-10 14:16:15
问题 I created an Octave script for training a neural network with 1 hidden layer using backpropagation but it can not seem to fit an XOR function. x Input 4x2 matrix [0 0; 0 1; 1 0; 1 1] y Output 4x1 matrix [0; 1; 1; 0] theta Hidden / output layer weights z Weighted sums a Activation function applied to weighted sums m Sample count ( 4 here) My weights are initialized as follows epsilon_init = 0.12; theta1 = rand(hiddenCount, inputCount + 1) * 2 * epsilon_init * epsilon_init; theta2 = rand

How is a multiple-outputs deep learning model trained?

非 Y 不嫁゛ 提交于 2019-12-09 06:07:25
问题 I think I do not understand the multiple-output networks. Althrough i understand how the implementation is made and i succesfully trained one model like this, i don't understand how a multiple-outputs deep learning network is trained. I mean, what is happening inside the network during training? Take for example this network from the keras functional api guide: You can see the two outputs (aux_output and main_output). How is the backpropagation working? My intuition was that the model does

Understanding when to use python list in Pytorch

柔情痞子 提交于 2019-12-08 13:52:27
问题 Basically as this thread discusses here, you cannot use python list to wrap your sub-modules (for example your layers); otherwise, Pytorch is not going to update the parameters of the sub-modules inside the list. Instead you should use nn.ModuleList to wrap your sub-modules to make sure their parameters are going to be updated. Now I have also seen codes like following where the author uses python list to calculate the loss and then do loss.backward() to do the update (in reinforce algorithm