When I use TensorFlow to decode `csv` file, how can I apply 'tf.map_fn' to SparseTensor?

匿名 (未验证) 提交于 2019-12-03 08:30:34

问题:

When I used following codes

import tensorflow as tf  # def input_pipeline(filenames, batch_size): #     # Define a `tf.contrib.data.Dataset` for iterating over one epoch of the data. #     dataset = (tf.contrib.data.TextLineDataset(filenames) #                .map(lambda line: tf.decode_csv( #                     line, record_defaults=[['1'], ['1'], ['1']], field_delim='-')) #                .shuffle(buffer_size=10)  # Equivalent to min_after_dequeue=10. #                .batch(batch_size))  #     # Return an *initializable* iterator over the dataset, which will allow us to #     # re-initialize it at the beginning of each epoch. #     return dataset.make_initializable_iterator()   def decode_func(line):     record_defaults = [['1'],['1'],['1']]     line = tf.decode_csv(line, record_defaults=record_defaults, field_delim='-')     str_to_int = lambda r: tf.string_to_number(r, tf.int32)     query = tf.string_split(line[:1], ",").values     title = tf.string_split(line[1:2], ",").values     query = tf.map_fn(str_to_int, query, dtype=tf.int32)     title = tf.map_fn(str_to_int, title, dtype=tf.int32)     label = line[2]     return query, title, label  def input_pipeline(filenames, batch_size):     # Define a `tf.contrib.data.Dataset` for iterating over one epoch of the data.     dataset = tf.contrib.data.TextLineDataset(filenames)     dataset = dataset.map(decode_func)     dataset = dataset.shuffle(buffer_size=10)  # Equivalent to min_after_dequeue=10.     dataset = dataset.batch(batch_size)      # Return an *initializable* iterator over the dataset, which will allow us to     # re-initialize it at the beginning of each epoch.     return dataset.make_initializable_iterator()    filenames=['2.txt'] batch_size = 3 num_epochs = 10 iterator = input_pipeline(filenames, batch_size)  # `a1`, `a2`, and `a3` represent the next element to be retrieved from the iterator.     a1, a2, a3 = iterator.get_next()  with tf.Session() as sess:     for _ in range(num_epochs):         print(_)         # Resets the iterator at the beginning of an epoch.         sess.run(iterator.initializer)         try:             while True:                 a, b, c = sess.run([a1, a2, a3])                 print(type(a[0]), b, c)         except tf.errors.OutOfRangeError:             print('stop')             # This will be raised when you reach the end of an epoch (i.e. the             # iterator has no more elements).             pass                           # Perform any end-of-epoch computation here.         print('Done training, epoch reached') 

The script crashed didn't return any results, and stop when reached a, b, c = sess.run([a1, a2, a3]), but when I commented

query = tf.map_fn(str_to_int, query, dtype=tf.int32) title = tf.map_fn(str_to_int, title, dtype=tf.int32) 

It works and return the results.

In 2.txt, the data format is like

1,2,3-4,5-0 1-2,3,4-1 4,5,6,7,8-9-0 

In addition, why the return results are byte-like object rather than str?

回答1:

I had a look and it appears that if you replace:

query = tf.map_fn(str_to_int, query, dtype=tf.int32) title = tf.map_fn(str_to_int, title, dtype=tf.int32) label = line[2] 

by

query = tf.string_to_number(query, out_type=tf.int32) title = tf.string_to_number(title, out_type=tf.int32) label = tf.string_to_number(line[2], out_type=tf.int32) 

it works just fine.

It appears that having 2 nested TensorFlow lambda functions (the tf.map_fnand the DataSet.map) just don't work. Luckily enough, it was over complicated.

Regarding your second question, I got this as output:

[(array([4, 5, 6, 7, 8], dtype=int32), array([9], dtype=int32), 0)] <type 'numpy.ndarray'> 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!