I\'d like to take a dataset with a bunch of different unique individuals, each with multiple entries, and assign each individual a unique id for all of their entries. Here\'
This method allow the 'id' column name to be defined with a variable. Plus I find it a little easier to read compared to the assign or groupby methods.
# Create Dataframe
df = pd.DataFrame(
{'FirstName': ['Tom','Tom','David','Alex','Alex'],
'LastName': ['Jones','Jones','Smith','Thompson','Thompson'],
})
newIdName = 'id' # Set new name here.
df[newIdName] = (df['LastName'] + '_' + df['FirstName']).astype('category').cat.codes
Output:
>>> df
FirstName LastName id
0 Tom Jones 0
1 Tom Jones 0
2 David Smith 1
3 Alex Thompson 2
4 Alex Thompson 2