tf.data.dataset

How to use tf.data.Dataset with kedro?

烈酒焚心 提交于 2021-02-11 14:58:23
问题 I am using tf.data.Dataset to prepare a streaming dataset which is used to train a tf.kears model. With kedro, is there a way to create a node and return the created tf.data.Dataset to use it in the next training node? The MemoryDataset will probably not work because tf.data.Dataset cannot be pickled ( deepcopy isn't possible), see also this SO question. According to issue #91 the deep copy in MemoryDataset is done to avoid modifying the data by some other node. Can someone please elaborate a

extracting numpy value from tensorflow object during transformation

旧时模样 提交于 2021-02-10 05:13:14
问题 i am trying to get word embeddings using tensorflow, and i have created adjacent work lists using my corpus. Number of unique words in my vocab are 8000 and number of adjacent word lists are around 1.6 million Word Lists sample photo Since the data is very large i am trying to write the word lists in batches to TFRecords file. def save_tfrecords_wordlist(toprocess_word_lists, path ): writer = tf.io.TFRecordWriter(path) for word_list in toprocess_word_lists: features=tf.train.Features( feature

extracting numpy value from tensorflow object during transformation

醉酒当歌 提交于 2021-02-10 05:07:38
问题 i am trying to get word embeddings using tensorflow, and i have created adjacent work lists using my corpus. Number of unique words in my vocab are 8000 and number of adjacent word lists are around 1.6 million Word Lists sample photo Since the data is very large i am trying to write the word lists in batches to TFRecords file. def save_tfrecords_wordlist(toprocess_word_lists, path ): writer = tf.io.TFRecordWriter(path) for word_list in toprocess_word_lists: features=tf.train.Features( feature

How can I modifya sequencial data using map or filter or reduce method for tf.data.Dataset objects?

牧云@^-^@ 提交于 2020-11-25 04:05:20
问题 I have a python data generator- import numpy as np import tensorflow as tf vocab_size = 5 def create_generator(): 'generates sequences of varying lengths(5 to 7) with random number from 0 to voca_size-1' count = 0 while count < 5: sequence_len = np.random.randint(5, 8) # length varies from 5 to 7 seq = np.random.randint(0, vocab_size, (sequence_len)) yield seq count +=1 gen = tf.data.Dataset.from_generator(create_generator, args=[], output_types=tf.int32, output_shapes = (None, ), ) for g in