Does tensorflow have something similar to scikit learn\'s one hot encoder for processing categorical data? Would using a placeholder of tf.string behave as categorical data
Tensorflow 2.0 Compatible Answer: You can do it efficiently using Tensorflow Transform
.
Code for performing One-Hot Encoding using Tensorflow Transform
is shown below:
def get_feature_columns(tf_transform_output):
"""Returns the FeatureColumns for the model.
Args:
tf_transform_output: A `TFTransformOutput` object.
Returns:
A list of FeatureColumns.
"""
# Wrap scalars as real valued columns.
real_valued_columns = [tf.feature_column.numeric_column(key, shape=())
for key in NUMERIC_FEATURE_KEYS]
# Wrap categorical columns.
one_hot_columns = [
tf.feature_column.categorical_column_with_vocabulary_file(
key=key,
vocabulary_file=tf_transform_output.vocabulary_file_by_name(
vocab_filename=key))
for key in CATEGORICAL_FEATURE_KEYS]
return real_valued_columns + one_hot_columns
For more information, refer this Tutorial on TF_Transform.