How to load image files dataset to TensorFlow Jupyter Notebook

徘徊边缘 提交于 2019-12-06 14:23:35

问题


I'm trying to create a model to classify some plants, just so I can learn how to use TensorFlow. The problem is that every good example that I can use as reference is loading a .csv dataset and I want to load a .jpeg dataset (could be .png or .jpg as well).

Those examples even use a built in dataset like:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

My dataset is organized in folders containing the label of the flower and inside there are the images organized by numbers.


回答1:


Let me assume that your folder structure is as follows:

├── testfiles
|   ├── BougainvilleaGlabra
|   |   ├── BougainvilleaGlabra_001.jpeg
|   |   ├── *.jpeg
|   ├── HandroanthusChrysotrichus
|   |   ├── HandroanthusChrysotrichus_001.jpeg
|   |   ├── *.jpeg
|   ├── SpathodeaVenusta
|   |   ├── SpathodeaVenusta_001.jpeg
|   |   ├── *.jpeg
|   ├──TibouchinaMutabilis
|   |   ├── TibouchinaMutabilis_001.jpeg
|   |   ├── *.jpeg
├── test.py

First you need to get all the image paths.

import glob,os

path = 'testfiles/'
files = [f for f in glob.glob(path + "*/*.jpeg", recursive=True)]
print(files)

['testfiles/HandroanthusChrysotrichus/HandroanthusChrysotrichus_002.jpeg', 'testfiles/HandroanthusChrysotrichus/HandroanthusChrysotrichus_001.jpeg', ...]

Then you need encode every class to number.

label_map = {'BougainvilleaGlabra':0,
             'HandroanthusChrysotrichus':1,
             'SpathodeaVenusta':2,
             'TibouchinaMutabilis':3,}
label = [label_map[os.path.basename(file).split('_')[0]] for file in files]
print(label)

[1, 1, 1, 0, 0, 0, 2, 2, 2, 3, 3, 3]

And then you can use tf.data.Dataset. You need a function to read image and resieze them to the same shape.

import tensorflow as tf
def read_image(filename,label):
    image_string = tf.read_file(filename)
    image_decoded = tf.image.decode_jpeg(image_string)
    image_resized = tf.image.resize_images(image_decoded, [28, 28])
    return image_resized,label

dataset = tf.data.Dataset.from_tensor_slices((files,label))
# you can use batch() to set batch_size
dataset = dataset.map(read_image).shuffle(1000).batch(2)
print(dataset.output_shapes)
print(dataset.output_types)

(TensorShape([Dimension(None), Dimension(28), Dimension(28), Dimension(None)]), TensorShape([Dimension(None)]))
(tf.float32, tf.int32)

Finally you define iterator to get batch data.

iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    for _ in range(2):
        sess.run(iterator.initializer)
        batch_image,batch_label = sess.run(next_element)
        print(batch_image.shape)
        print(batch_label.shape)

(2, 28, 28, 4)
(2,)
(2, 28, 28, 4)
(2,)


来源:https://stackoverflow.com/questions/56130320/how-to-load-image-files-dataset-to-tensorflow-jupyter-notebook

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!