I have a dataframe with this type of data (too many columns):
col1 int64
col2 int64
col3 category
col4 category
col5 categ
If your concern was only that you making a extra column and deleting it later, just dun use a new column at the first place.
dataframe = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'), 'col3':list('ababb')})
dataframe.col3 = pd.Categorical.from_array(dataframe.col3).codes
You are done. Now as Categorical.from_array
is deprecated, use Categorical
directly
dataframe.col3 = pd.Categorical(dataframe.col3).codes
If you also need the mapping back from index to label, there is even better way for the same
dataframe.col3, mapping_index = pd.Series(dataframe.col3).factorize()
check below
print(dataframe)
print(mapping_index.get_loc("c"))