Anonymizing data / replacing names

后端 未结 3 775
一整个雨季
一整个雨季 2021-01-24 06:13

Normally I anonymize my data by using hashlib and using the .apply(hash) function.

Now im trying a new approach, imagine I have to following df called \'data\':

3条回答
  •  忘了有多久
    2021-01-24 06:43

    Maybe try to create a data frame called "index" for this operation and keep unique name values inside it?

    Then produce masks with unique name indexes and merge the resulting data frame indexwith data.

    index = pd.DataFrame()
    index['name'] = df['name'].unique()
    index['mask'] = index['name'].apply(lambda x : 'person' + 
    str(index[index.name == x].index[0] + 1))
    
    data.merge(index, how='left')[['mask', 'amount']]
    

提交回复
热议问题