Split .tfrecords file into many .tfrecords files

前端 未结 4 741
长情又很酷
长情又很酷 2020-12-09 20:37

Is there any way to split .tfrecords file into many .tfrecords files directly, without writing back each Dataset example ?

4条回答
  •  忘掉有多难
    2020-12-09 21:03

    In tensorflow 2.0.0, this will work:

    import tensorflow as tf
    
    raw_dataset = tf.data.TFRecordDataset("input_file.tfrecord")
    
    shards = 10
    
    for i in range(shards):
        writer = tf.data.experimental.TFRecordWriter(f"output_file-part-{i}.tfrecord")
        writer.write(raw_dataset.shard(shards, i))
    

提交回复
热议问题