I\'m trying to use scikit-learn\'s LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to a
We don't need a LabelEncoder.
You can convert the columns to categoricals and then get their codes. I used a dictionary comprehension below to apply this process to every column and wrap the result back into a dataframe of the same shape with identical indices and column names.
>>> pd.DataFrame({col: df[col].astype('category').cat.codes for col in df}, index=df.index)
location owner pets
0 1 1 0
1 0 2 1
2 0 0 0
3 1 1 2
4 1 3 1
5 0 2 1
To create a mapping dictionary, you can just enumerate the categories using a dictionary comprehension:
>>> {col: {n: cat for n, cat in enumerate(df[col].astype('category').cat.categories)}
for col in df}
{'location': {0: 'New_York', 1: 'San_Diego'},
'owner': {0: 'Brick', 1: 'Champ', 2: 'Ron', 3: 'Veronica'},
'pets': {0: 'cat', 1: 'dog', 2: 'monkey'}}