I've run into an issue trying to use Tensorflow's feature_column mappings inside of a function passed in to the Dataset map method. This happens when trying to one hot encode categorical string features of a Dataset as part of the input pipeline using Dataset.map. The error message I'm getting is that: tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized.
The following code is a basic example that recreates the problem:
import numpy as np import tensorflow as tf from tensorflow.contrib.lookup import index_table_from_tensor # generate tfrecords with two string categorical features and write to file vlists = dict(season=['Spring', 'Summer', 'Fall', 'Winter'], day=['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']) writer = tf.python_io.TFRecordWriter('test.tfr') for s,d in zip(np.random.choice(vlists['season'],50), np.random.choice(vlists['day'],50)): example = tf.train.Example( features = tf.train.Features( feature={ 'season':tf.train.Feature( bytes_list=tf.train.BytesList(value=[s.encode()])), 'day':tf.train.Feature( bytes_list=tf.train.BytesList(value=[d.encode()])) } ) ) serialized = example.SerializeToString() writer.write(serialized) writer.close()
Now there's a tfrecord file in the cwd called test.tfr with 50 records, and each record consists of two string features, 'season' and 'day', The following will then create a Dataset that will parse the tfrecords and create batches of size 4
def parse_record(element): feats = { 'season': tf.FixedLenFeature((), tf.string), 'day': tf.FixedLenFeature((), tf.string) } return tf.parse_example(element, feats) fname = tf.placeholder(tf.string, []) ds = tf.data.TFRecordDataset(fname) ds = ds.batch(4).map(parse_record)
At this point if you create an iterator and call get_next on it several times, it works as expected and you would see output like this each run:
iterator = ds.make_initializable_iterator() nxt = iterator.get_next() sess.run(tf.tables_initializer()) sess.run(iterator.initializer, feed_dict={fname:'test.tfr'}) sess.run(nxt) # output of run(nxt) would look like # {'day': array([b'Sat', b'Thu', b'Fri', b'Thu'], dtype=object), 'season': array([b'Winter', b'Winter', b'Fall', b'Summer'], dtype=object)}
However, if I wanted to use feature_columns to one hot encode those categoricals as a Dataset transformation using map, then it runs once producing correct output, but on every subsequent call to run(nxt) it gives the Tables already initialized error, eg:
# using the same Dataset ds from above season_enc = tf.feature_column.categorical_column_with_vocabulary_list( key='season', vocabulary_list=vlists['season']) season_col = tf.feature_column.indicator_column(season_enc) day_enc = tf.feature_column.categorical_column_with_vocabulary_list( key='day', vocabulary_list=vlists['day']) day_col = tf.feature_column.indicator_column(day_enc) cols = [season_col, day_col] def _encode(element, feat_cols=cols): return tf.feature_column.input_layer(element, feat_cols) ds1 = ds.map(_encode) iterator = ds1.make_initializable_iterator() nxt = iterator.get_next() sess.run(tf.tables_initializer()) sess.run(iterator.initializer, feed_dict={fname:'test.tfr'}) sess.run(nxt) # first run will produce correct one hot encoded output sess.run(nxt) # second run will generate W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: Table already initialized. 2018-01-25 19:29:55.802358: W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: Table already initialized. 2018-01-25 19:29:55.802612: W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: Table already initialized.
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized.
However, if I try to do the one hot encoding manually without feature_columns as below, then it only works if tables are created before the map function, otherwise it gives the same error above
# using same original Dataset ds tables = dict(season=index_table_from_tensor(vlists['season']), day=index_table_from_tensor(vlists['day'])) def to_dummy(element): s = tables['season'].lookup(element['season']) d = tables['day'].lookup(element['day']) return (tf.one_hot(s, depth=len(vlists['season']), axis=-1), tf.one_hot(d, depth=len(vlists['day']), axis=-1)) ds2 = ds.map(to_dummy) iterator = ds2.make_initializable_iterator() nxt = iterator.get_next() sess.run(tf.tables_initializer()) sess.run(iterator.initializer, feed_dict={fname:'test.tfr'}) sess.run(nxt)
It seems as if it has something to do with the scope or namespace of the index lookup tables created by feature_columns, but I'm not sure how to figure out what's happening here, I've tried changing where and when the feature_column objects are defined, but it hasn't made a difference.