Q: [Pandas] How to efficiently assign unique ID to individuals with multiple entries based on name in very large df

前端 未结 3 1580
粉色の甜心
粉色の甜心 2020-12-02 17:45

I\'d like to take a dataset with a bunch of different unique individuals, each with multiple entries, and assign each individual a unique id for all of their entries. Here\'

3条回答
  •  春和景丽
    2020-12-02 17:59

    This method allow the 'id' column name to be defined with a variable. Plus I find it a little easier to read compared to the assign or groupby methods.

    # Create Dataframe
    df = pd.DataFrame(
        {'FirstName': ['Tom','Tom','David','Alex','Alex'],
        'LastName': ['Jones','Jones','Smith','Thompson','Thompson'],
        })
    
    newIdName = 'id'   # Set new name here.
    
    df[newIdName] = (df['LastName'] + '_' + df['FirstName']).astype('category').cat.codes     
    

    Output:

    >>> df
              FirstName  LastName  id
            0       Tom     Jones   0
            1       Tom     Jones   0
            2     David     Smith   1
            3      Alex  Thompson   2
            4      Alex  Thompson   2
    

提交回复
热议问题