pandas.factorize encodes input values as an enumerated type or categorical variable.
But how can I easily and efficiently convert many columns of a data frame? What
I also found this answer quite helpful: https://stackoverflow.com/a/20051631/4643212
I was trying to take values from an existing column in a Pandas DataFrame (a list of IP addresses named 'SrcIP') and map them to numerical values in a new column (named 'ID' in this example).
Solution:
df['ID'] = pd.factorize(df.SrcIP)[0]
Result:
SrcIP | ID
192.168.1.112 | 0
192.168.1.112 | 0
192.168.4.118 | 1
192.168.1.112 | 0
192.168.4.118 | 1
192.168.5.122 | 2
192.168.5.122 | 2
...