Is there a way to partition a tf.Dataset with TensorFlow’s Dataset API? (Not a partition of a simple np.array)

好久不见. 提交于 2019-12-11 17:12:06

问题


I checked the doc but I could not find a method for it. I want to de cross validation, so I kind of need it.

Note that I'm not asking how to split a tensor, as I know that TensorFlow provides an API for that an has been answered in another question. I'm asking on how to partition a tf.Dataset (which is an abstraction).


回答1:


You could either:

1) Use the shard transformation partition the dataset into multiple "shards". Note that for best performance, sharding should be to data sources (e.g. filenames).

2) As of TensorFlow 1.12, you can also use the window transformation to build a dataset of datasets.




回答2:


I am afraid you cannot. The dataset API is a way to efficiently stream inputs to your net at run time. It is not a set of tools to manipulate datasets as a whole -- in that regards it might be a bit of a misnomer.

Also, if you could, this would probably be a bad idea. You would rather have this train/test split done once and for all.

  • it let you review those sets offline
  • if the split is done each time you run an experiment there is a risk that samples start swapping sets if you are not extremely careful (e.g. when you add more data to your existing dataset)

See also a related question about how to split a set into training & testing in tensorflow.



来源:https://stackoverflow.com/questions/50204609/is-there-a-way-to-partition-a-tf-dataset-with-tensorflow-s-dataset-api-not-a-p

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!