TensorFlow tfrecords: tostring() changes dimension of image

I have built a model to train a convolutional autoencoder in TensorFlow. I followed the instructions on Reading Data from the TF documentation to read in my own images of size 233 x 233 x 3. Here is my convert_to() function adapted from those instructions:

def convert_to(images, name):
  """Converts a dataset to tfrecords."""
  num_examples = images.shape[0]
  rows = images.shape[1]
  cols = images.shape[2]
  depth = images.shape[3]

  filename = os.path.join(FLAGS.tmp_dir, name + '.tfrecords')
  print('Writing', filename)
  writer = tf.python_io.TFRecordWriter(filename)
  for index in range(num_examples):
    print(images[index].size)
    image_raw = images[index].tostring()
    print(len(image_raw))
    example = tf.train.Example(features=tf.train.Features(feature={
        'height': _int64_feature(rows),
        'width': _int64_feature(cols),
        'depth': _int64_feature(depth),
        'image_raw': _bytes_feature(image_raw)}))
    writer.write(example.SerializeToString())
  writer.close()

When I print the size of the image at the start of the for loop, the size is 162867, but when I print after the .tostring() line, the size is 1302936. This causes problems down the road because the model thinks my input is 8x what it should be. Is it better to change the 'image_raw' entry in the Example to _int64_feature(image_raw) or to change the way I convert it to a string?

Alternatively, the problem could be in my read_and_decode() function, e.g. the string is not properly being decoded or the example not being parsed...?

def read_and_decode(self, filename_queue):
    reader = tf.TFRecordReader()

    _, serialized_example = reader.read(filename_queue)
    features = tf.parse_single_example(
        serialized_example,
        features={
            'height': tf.FixedLenFeature([], tf.int64),
            'width': tf.FixedLenFeature([], tf.int64),
            'depth': tf.FixedLenFeature([], tf.int64),
            'image_raw': tf.FixedLenFeature([], tf.string)
      })

    # Convert from a scalar string tensor to a uint8 tensor
    image = tf.decode_raw(features['image_raw'], tf.uint8)

    # Reshape into a 233 x 233 x 3 image and apply distortions
    image = tf.reshape(image, (self.input_rows, self.input_cols, self.num_filters))

    image = data_sets.normalize(image)
    image = data_sets.apply_augmentation(image)

    return image

Thank you!

I may have some answers to your problem.

First, it's perfectly normal that your image is 8x longer after the .tostring() method. The latter converts your array in bytes. It's badly named because in python 3 a byte differs from a string (but they are the same in python 2). By default, I guess that your image is defined in int64, so each element will be encoded with 8 bytes (or 64 bits). In your example, the 162867 pixels of your image are encoded in 1302936 bytes...

Concerning your error during the parsing, I think it comes from the fact you write your data in int64 (integers encoded with 64 bits, so 8 bytes) and read them in uint8 (unsigned integers encoded with 8 bits, so 1 byte). The same integer will have a different sequence of bytes if it's defined in int64 or int8. Write your image in bytes is the good practice when it comes to use tfrecord files, but you'll need to read them in bytes as well, using the proper type.

For your code, try image = tf.decode_raw(features['image_raw'], tf.int64) instead.

The bug seems to be here.

#Convert from a scalar string tensor to a uint8 tensor
image = tf.decode_raw(features['image_raw'], tf.uint8)#the image looks like the tensor with 1302936 values.
image.set_shape([self.input_rows*self.input_cols*self.num_filters])#self.input_rows*self.input_cols*self.num_filters equals 162867， right?

That's all my guess cause the code you provided is too little.

来源：https://stackoverflow.com/questions/45427798/tensorflow-tfrecords-tostring-changes-dimension-of-image

标签

python

machine-learning

tensorflow

conv-neural-network

autoencoder