How to use tf.data.Dataset with kedro?

烈酒焚心 提交于 2021-02-11 14:58:23

问题


I am using tf.data.Dataset to prepare a streaming dataset which is used to train a tf.kears model. With kedro, is there a way to create a node and return the created tf.data.Dataset to use it in the next training node?

The MemoryDataset will probably not work because tf.data.Dataset cannot be pickled (deepcopy isn't possible), see also this SO question. According to issue #91 the deep copy in MemoryDataset is done to avoid modifying the data by some other node. Can someone please elaborate a bit more on why/how this concurrent modification could happen?

From the docs, there seems to be a copy_mode = "assign". Would it be possible to use this option in case the data is not picklable?

Another solution (also mentioned in issue 91) is to use just a function to generate the streaming tf.data.Dataset inside the training node, without having the preceding dataset generation node. However, I am not sure what the drawbacks of this approach will be (if any). Would be greate if someone could give some examples.

Also, I would like to avoid storing the complete output of the streaming dataset, for example using tfrecords or tf.data.experimental.save as these options would use a lot of disk storage.

Is there a way to pass just the created tf.data.Dataset object to use it for the training node?

来源:https://stackoverflow.com/questions/63730066/how-to-use-tf-data-dataset-with-kedro

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!