conv-neural-network | 易学教程

Should I gray scale the image?

阅读更多关于 Should I gray scale the image?

问题 I'm categorizing 30 types of clothes from the image using R-CNN Object Detection Library from tensorflow : https://github.com/tensorflow/models/tree/master/research/object_detection Does color matter when we collect images for training and testing? If I put only purple and blue shirts, I guess it won't recognize red shirts? Should I gray scale all images to detect the types of clothes? :) 回答1: Yes, colour does matter. The underlying visual feature extraction is based on a convolutional neural

How to apply a pre-trained model of 3 channel images on single channel images?

阅读更多关于 How to apply a pre-trained model of 3 channel images on single channel images?

问题 I tried to used a pre-trained model that already was trained on three-channel color images, however, I am getting an error because of shape difference. Could someone let me know how can I tackle this issue? One user had suggested using Tile layer, but I could not find any relevant document/help for using this layer or any other solution. I really appreciate your help. 回答1: There is not much information in caffe.proto about tile layer. If you look at the code it just copies data tiles times

How to use tf.nn.ctc_loss in cnn+ctc network

阅读更多关于 How to use tf.nn.ctc_loss in cnn+ctc network

问题 Recently, I try to use tensorflow to implement a cnn+ctc network base on the article Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks. I try to feed batch spectrogram data (shape:(10,120,155,3),batch_size is 10) into 10 convolution layer and 3 fully connected layer. So the output before connecting the ctc layer is 2d data(shape:(10,1024)). Here is my problem: I want to use tf.nn.ctc_loss function in tensorflow library,but it generate the ValueError: Dimension must

TensorFlow feed an integer

阅读更多关于 TensorFlow feed an integer

I am trying to do a convolution over variable input sizes. To achieve that, I am using a batch size of 1. However, one of the nodes is a max pooling node which needs the shape of the input as a list ksize : pooled = tf.nn.max_pool( h, ksize=[1, self.input_size - filter_size + 1, 1, 1], strides=[1, 1, 1, 1], padding='VALID', name="pool") Now, clearly the input_size can be inferred from the input (which is a placeholder): self.input_x = tf.placeholder(tf.int32, [None, None], name="input_x") But I can't use self.input_x.get_shape()[0] because the shape is dynamic. So I intend to pass the input

the number of neurons in AlexNet

阅读更多关于 the number of neurons in AlexNet

问题 In AlexNet,the image data is 3*224*224 . The first convolutional layer filters the image with 96 kernels of size 11*11*3 with a stride of 4 piexels. I have doubt with the first layer's output neurons count. In my opinion,the input is 224*224*3=150528 ,then the output should be 55*55*96=290400 But in the paper,they described the output is 253440 I don't know how to calculate the number of this layer's neurons. Is there someone could help me?Thank you! 回答1: It seems like the input size is

How to apply CNN to Short-time Fourier Transform?

阅读更多关于 How to apply CNN to Short-time Fourier Transform?

So I have a code which returns a Short-Time Fourier Transform spectrum of a .wav file. I want to be able to take, say a millisecond of the spectrum, and train a CNN on it. I'm not quite sure how I would implement that. I know how to format the image data to feed into the CNN, and how to train the network, but I'm lost on how to take the FFT-data and divide it into small time-frames. The FFT Code(Sorry for ultra long code): rate, audio = wavfile.read('scale_a_lydian.wav') audio = np.mean(audio, axis=1) N = audio.shape[0] L = N / rate M = 1024 # Audio is 44.1 Khz, or ~44100 samples / second #

Keras Functional model giving high validation accuracy but incorrect prediction

阅读更多关于 Keras Functional model giving high validation accuracy but incorrect prediction

问题 I am trying to do transfer learning for VGG16 architecture with 'ImageNet' pretrained weights on PASCAL VOC 2012 dataset. PASCAL VOC is a multi label image dataset with 20 classes, and so I have modified the inbuilt VGG16 model like this: def VGG16_modified(): base_model = vgg16.VGG16(include_top=True,weights='imagenet',input_shape=(224,224,3)) print(base_model.summary()) x = base_model.get_layer('block5_pool').output x = (GlobalAveragePooling2D())(x) predictions = Dense(20,activation=

How to train mix of image and data in CNN using ImageAugmentation in TFlearn

阅读更多关于 How to train mix of image and data in CNN using ImageAugmentation in TFlearn

问题 I would like to train a convolutional neural network in Tflearn-Tensorflow using a mix of images (pixel info) and data. Because I have a short number of images, I need to use the Image Augmentation to increase the number of image samples that I pass to the network. But that means that I can only pass image data as input data, having to add the non-image data at a later stage, presumably before the fully connected layer. I can't work out how to do this, since it seems that I can only tell the

How to use L2 pooling in Tensorflow?

阅读更多关于 How to use L2 pooling in Tensorflow?

问题 I am trying to implement one CNN architecture that uses L2 pooling. The reference paper particularly argues that L2 pooling was better than max pooling, so I would like to try L2 pooling after tanh activation function. However, Tensorflow seems to provide only tf.nn.ave_pool / tf.nn.max_pooling / tf.nn.max_pool_with_argmax. Is there a way to implement L2 pooling in Tensorflow? conv = tf..... h = tf.nn.tanh(conv) p = tf.pow(tf.nn.ave_pool(tf.pow(h,2)), 0.5) Will this be equivalent? Will this

Python/Tensorflow - I have trained the convolutional neural network, how to test it?

阅读更多关于 Python/Tensorflow - I have trained the convolutional neural network, how to test it?

问题 I have trained a convolutional neural network (CNN) with the following data that I had in a binary file (label, filename, data (pixels)): [array([2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0]), array(['10_c.jpg', '10_m.jpg', '10_n.jpg', '1_c.jpg', '1_m.jpg', '1_n.jpg', '2_c.jpg', '2_m.jpg', '2_n.jpg', '3_c.jpg', '3_m.jpg', '3_n.jpg', '4_c.jpg', '4_m.jpg', '4_n.jpg', '5_c.jpg', '5_m.jpg', '5_n.jpg', '6_c.jpg', '6_m.jpg', '6_n.jpg', '7_c.jpg', '7_m