I try to encode a number of columns containing categorical data (\"Yes\"
and \"No\"
) in a large pandas dataframe. The complete dataframe contains
As the following code, you can encode the multiple columns by applying LabelEncoder
to DataFrame. However, please note that we cannot obtain the classes information for all columns.
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame({'A': [1, 2, 3, 4],
'B': ["Yes", "No", "Yes", "Yes"],
'C': ["Yes", "No", "No", "Yes"],
'D': ["No", "Yes", "No", "Yes"]})
print(df)
# A B C D
# 0 1 Yes Yes No
# 1 2 No No Yes
# 2 3 Yes No No
# 3 4 Yes Yes Yes
# LabelEncoder
le = LabelEncoder()
# apply "le.fit_transform"
df_encoded = df.apply(le.fit_transform)
print(df_encoded)
# A B C D
# 0 0 1 1 0
# 1 1 0 0 1
# 2 2 1 0 0
# 3 3 1 1 1
# Note: we cannot obtain the classes information for all columns.
print(le.classes_)
# ['No' 'Yes']