Tensorflow One Hot Encoder?

后端 未结 15 1721
一生所求
一生所求 2020-12-04 13:46

Does tensorflow have something similar to scikit learn\'s one hot encoder for processing categorical data? Would using a placeholder of tf.string behave as categorical data

15条回答
  •  半阙折子戏
    2020-12-04 14:30

    As mentioned above by @dga, Tensorflow has tf.one_hot now:

    labels = tf.constant([5,3,2,4,1])
    highest_label = tf.reduce_max(labels)
    labels_one_hot = tf.one_hot(labels, highest_label + 1)
    
    array([[ 0.,  0.,  0.,  0.,  0.,  1.],
           [ 0.,  0.,  0.,  1.,  0.,  0.],
           [ 0.,  0.,  1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.,  1.,  0.],
           [ 0.,  1.,  0.,  0.,  0.,  0.]], dtype=float32)
    

    You need to specify depth, otherwise you'll get a pruned one-hot tensor.

    If you like to do it manually:

    labels = tf.constant([5,3,2,4,1])
    size = tf.shape(labels)[0]
    highest_label = tf.reduce_max(labels)
    labels_t = tf.reshape(labels, [-1, 1])
    indices = tf.reshape(tf.range(size), [-1, 1])
    idx_with_labels = tf.concat([indices, labels_t], 1)
    labels_one_hot = tf.sparse_to_dense(idx_with_labels, [size, highest_label + 1], 1.0)
    
    array([[ 0.,  0.,  0.,  0.,  0.,  1.],
           [ 0.,  0.,  0.,  1.,  0.,  0.],
           [ 0.,  0.,  1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.,  1.,  0.],
           [ 0.,  1.,  0.,  0.,  0.,  0.]], dtype=float32)
    

    Note arguments order in tf.concat()

提交回复
热议问题