Flipping and rotating a tensor (within a U-net) in Tensorflow 2.0 exhaust the memory. How to do it properly?

问题

I have a simple Dense-U-net implemented in Tensorflow 2.0 (and tf.keras) to do semantic segmentation. At the end of each dense block, I want to randomly rotate and/or flip the tensors (just for experimentation). The initial size of the image is 256x256. At the moment I want to perform flipping/rotating, the tensors have always C=5 channels (this is, shape=[?,256,256,5] (NHWC) in the beginning, reducing H & W as pooling is performed, but C is always 5 and batch size is N=8). Thus, the tensors are not really big.

For clarification: Note that I am not aiming to do augmentation but to analyze the network behavior when the intermediate feature maps suffer a slight orientation change. Similar ideas have been studied, either flipping/rotating kernels or the feature maps (Gao et al. 2017, Efficient and Invariant CNNs for Dense Prediction, arXiv:1711.09064).

My first attempt was to implement it as a function:

def random_rotation_flip(x):
    x = tf.image.random_flip_left_right(x)
    x = tf.image.rot90(x, k=tf.random.uniform(shape=[], minval=1, 
                                              maxval=4, dtype=tf.int32))
    return x

This launch an error:

OperatorNotAllowedInGraphError: using a tf.Tensor as a Python bool is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

I am not really sure what is happening, but if I include @tf.function at the beginning of the function definition:

@tf.function
def random_rotation_flip(x):
    x = tf.image.random_flip_left_right(x)
    x = tf.image.rot90(x, k=tf.random.uniform(shape=[], minval=1, 
                                              maxval=4, dtype=tf.int32))
    return x

I get:

TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2
The graph tensor has name: conv1_block1_2_conv/Identity:0

During handling of the above exception, another exception occurred:

_SymbolicException                        Traceback (most recent call last)
9 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/eager/execute.py in quick_execute(op_name, 
num_outputs, inputs, attrs, ctx, name)
     73       raise core._SymbolicException(
     74           "Inputs to eager execution function cannot be Keras symbolic "
---> 75           "tensors, but found {}".format(keras_symbolic_tensors))
     76     raise e
     77   # pylint: enable=protected-access

_SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found 
[<tf.Tensor 'conv1_block1_2_conv/Identity:0' shape=(None, 256, 256, 5) dtype=float32>]

My first questions are:

Can someone explain what is happening and how to solve this?
Is OK to do these two operations (flipping and rotation) within a function?
Should I do it in a different way?

Now the second part of the question!

In an attempt to overcome this, I implemented it in a Layer:

class MyRandomRotationFlipLayer(layers.Layer):
  def __init__(self):
    super(MyRandomRotationFlipLayer, self).__init__()

  def call(self, input):
    x = tf.image.random_flip_left_right(input)
    return tf.image.rot90(x, k=tf.random.uniform(shape=[], minval=1, 
                                                 maxval=4, dtype=tf.int32))

Now it works! However, I exhaust the memory while training.

Note: I am using Google Colab with GPU. The GPU has 12GB of memory. However, the memory problem is not from the GPU but the RAM memory, which is have 25 GB and indicated in the upper-right corner of the Google Colab notebook.

I noticed that the RAM is increasing slowly but continuously until it goes beyond the 25GB and it crashes.

I have reduced considerably the depth and feature maps of my convolutions (C=2) in order to make it work, but even in that case it eventually crashes the memory (maybe it works well for 100-200 iterations...).

This suggests that I am doing something wrong. Is it because new tensors are created and the old ones are not removed from memory? How this should be handled to avoid memory exhaustion?

Any input about how this should properly be done is highly welcome.

Thanks in advance!

回答1:

I've seen cases where people use the preprocessing functionality that Keras provides to do data augmentation (flipping and the like). Here is the reference: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator

Hopefully this will save you some effort or at least provide a working point of reference. I hope this helps.

来源：https://stackoverflow.com/questions/58595506/flipping-and-rotating-a-tensor-within-a-u-net-in-tensorflow-2-0-exhaust-the-me

标签

python

tensorflow

keras

google-colaboratory