Dataset API, Iterators and tf.contrib.data.rejection_resample

后端 未结 2 542
半阙折子戏
半阙折子戏 2021-01-03 06:28

[Edit #1 after @mrry comment] I am using the (great & amazing) Dataset API along with tf.contrib.data.rejection_resample to set a specific distribution

2条回答
  •  春和景丽
    2021-01-03 07:01

    Here is below a simple example to demonstrate the usage of sample_from_datasets (thanks @Agade for the idea).

    import math
    import tensorflow as tf
    import numpy as np
    
    
    def print_dataset(name, dataset):
        elems = np.array([v.numpy() for v in dataset])
        print("Dataset {} contains {} elements :".format(name, len(elems)))
        print(elems)
    
    
    def combine_datasets_balanced(dataset_smaller, size_smaller, dataset_bigger, size_bigger, batch_size):
        ds_smaller_repeated = dataset_smaller.repeat(count=int(math.ceil(size_bigger / size_smaller)))
        # we repeat the smaller dataset so that the 2 datasets are about the same size
        balanced_dataset = tf.data.experimental.sample_from_datasets([ds_smaller_repeated, dataset_bigger], weights=[0.5, 0.5])
        # each element in the resulting dataset is randomly drawn (without replacement) from dataset even with proba 0.5 or from odd with proba 0.5
        balanced_dataset = balanced_dataset.take(2 * size_bigger).batch(batch_size)
        return balanced_dataset
    
    
    N, M = 3, 10
    even = tf.data.Dataset.range(0, 2 * N, 2).repeat(count=int(math.ceil(M / N)))
    odd = tf.data.Dataset.range(1, 2 * M, 2)
    even_odd = combine_datasets_balanced(even, N, odd, M, 2)
    
    print_dataset("even", even)
    print_dataset("odd", odd)
    print_dataset("even_odd_all", even_odd)
    
    Output :
    
    Dataset even contains 12 elements :  # 12 = 4 x N  (because of .repeat)
    [0 2 4 0 2 4 0 2 4 0 2 4]
    Dataset odd contains 10 elements :
    [ 1  3  5  7  9 11 13 15 17 19]
    Dataset even_odd contains 10 elements :  # 10 = 2 x M / 2  (2xM because of .take(2 * M) and /2 because of .batch(2))
    [[ 0  2]
     [ 1  4]
     [ 0  2]
     [ 3  4]
     [ 0  2]
     [ 4  0]
     [ 5  2]
     [ 7  4]
     [ 0  9]
     [ 2 11]] 
    

提交回复
热议问题