Tensorflow: How to encode and read bmp images?

隐身守侯 提交于 2019-12-20 04:21:43

问题


I am trying to read .bmp images, do some augmentation on these, save them to a .tfrecords file and then open the .tfrecords files and use the images for image classification. I know that there is a tf.image.encode_jpeg() and a tf.image.encode_png() function, but there is no tf.image.encode_bmp() function. I know that .bmp images are uncompressed, so I've tried to simply base64-encode, np.tostring() and np.tobytes() the images, but I get the following error when trying to decode these formats:

tensorflow.python.framework.errors_impl.InvalidArgumentError: channels attribute 3 does not match bits per pixel from file <some long number>

My take is that tensorflow, in its encoding to jpeg or png, does something extra with the byte encoding of the images; saving information about array dimensionality, etc. However, I am quite clueless about this, so any help would be great!

Some code to show what it is I am trying to achieve:

with tf.gfile.FastGFile(filename, 'rb') as f:
    image_data = f.read()
    bmp_data = tf.placeholder(dtype=tf.string)
    decode_bmp = tf.image.decode_bmp(self._decode_bmp_data, channels=3)
    augmented_bmp = <do some augmentation on decode_bmp>
    sess = tf.Session()
    np_img = sess.run(augmented_bmp, feed_dict={bmp_data: image_data})
    byte_img = np_img.tostring()

    # Write byte_img to file using tf.train.Example
    writer = tf.python_io.TFRecordWriter(<output_tfrecords_filename>)
    example = tf.train.Example(features=tf.train.Features(feature={
        'encoded_img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[byte_img])}))
    writer.write(example.SerializeToString())

    # Read img from file
    dataset = tf.data.TFRecordDataset(<img_file>)
    dataset = dataset.map(parse_img_fn)

The parse_img_fn may be condensed to the following:

def parse_img_fn(serialized_example):
    features = tf.parse_single_example(serialized_example, feature_map)
    image = features['encoded_img']
    image = tf.image.decode_bmp(image, channels=3) # This is where the decoding fails
    features['encoded_img']

    return features

回答1:


in your comment, surely you mean encode instead of encrypt

The BMP file format is quite simplistic, consisting of a bunch of headers and pretty much raw pixel data. This is why BMP images are so big. I suppose this is also why TensorFlow developers did not bother to write a function to encode arrays (representing images) into this format. Few people still use it. It is recommended to use PNG instead, which performs lossless compression of the image. Or, if you can deal with lossy compression, use JPG.

TensorFlow doesn't do anything special for encoding images. It just returns the bytes that represent the image in that format, similar to what matplotlib does when you do save_fig (except MPL also writes the bytes to a file).

Suppose you produce a numpy array where the top rows are 0 and the bottom rows are 255. This is an array of numbers which, if you think it as a picture, would represent 2 horizontal bands, the top one black and the bottom one white.

If you want to see this picture in another program (GIMP) you need to encode this information in a standard format, such as PNG. Encoding means adding some headers and metadata and, optionally, compressing the data.


Now that it is a bit more clear what encoding is, I recommend you work with PNG images.

with tf.gfile.FastGFile('image.png', 'rb') as f:
    # get the bytes representing the image
    # this is a 1D array (string) which includes header and stuff
    raw_png = f.read()

    # decode the raw representation into an array
    # so we have 2D array representing the image (3D if colour) 
    image = tf.image.decode_png(raw_png)

    # augment the image using e.g.
    augmented_img = tf.image.random_brightness(image)

    # convert the array back into a compressed representation
    # by encoding it into png
    # we now end up with a string again
    augmented_png = tf.image.encode_png(augmented_img, compression=9) 

    # Write augmented_png to file using tf.train.Example
    writer = tf.python_io.TFRecordWriter(<output_tfrecords_filename>)
    example = tf.train.Example(features=tf.train.Features(feature={
        'encoded_img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[augmented_png])}))
    writer.write(example.SerializeToString())

    # Read img from file
    dataset = tf.data.TFRecordDataset(<img_file>)
    dataset = dataset.map(parse_img_fn)

There are a few important pieces of advice:

  • don't use numpy.tostring. This returns a HUUGE representation because each pixel is represented as a float, and they are all concatenated. No compression, nothing. Try and check the file size :)

  • no need to pass back into python by using tf.Session. You can perform all the ops on TF side. This way you have an input graph which you can reuse as part of an input pipeline.



来源:https://stackoverflow.com/questions/50871281/tensorflow-how-to-encode-and-read-bmp-images

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!