Reverse the Multi label binarizer in pandas

眉间皱痕 提交于 2020-01-01 22:28:10

问题


I have pandas dataframe as

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()

# load sample data 
df = pd.DataFrame( {'user_id':['1','1','2','2','2','3'], 'fruits':['banana','orange','orange','apple','banana','mango']})

I collect all the fruits for each user using below code -

# collect fruits for each user 
transformed_df= df.groupby('user_id').agg({'fruits':lambda x: list(x)}).reset_index()

print(transformed_df)
  user_id                   fruits
0       1         [banana, orange]
1       2  [orange, apple, banana]
2       3                  [mango]

Once I get this list, I do multilabel-binarizer operation to convert this list into ones or zeroes

# perform MultiLabelBinarizer
final_df = transformed_df.join(pd.DataFrame(mlb.fit_transform(transformed_df.pop('fruits')),columns=mlb.classes_,index=transformed_df.index))

print(final_df)
  user_id  apple  banana  mango  orange
0       1      0       1      0       1
1       2      1       1      0       1
2       3      0       0      1       0

Now, I have a requirement wherein, the input dataframe given to me is final_df and I need to get back the transformed_df which contains the list of fruits for each user.

How can I get this transformed_df back , given that I have final_df as input dataframe?

I am trying to get this working

# Trying to get this working
inverse_df = final_df.join(pd.DataFrame(mlb.inverse_transform(final_df.loc[:, final_df.columns != 'user_id'].as_matrix())))

inverse_df
  user_id  apple  banana  mango  orange       0       1       2
0       1      0       1      0       1  banana  orange    None
1       2      1       1      0       1   apple  banana  orange
2       3      0       0      1       0   mango    None    None

But it doesnt give me the list back.


回答1:


inverse_transform() method should help. Here's the documentation - https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer.inverse_transform.



来源:https://stackoverflow.com/questions/55764055/reverse-the-multi-label-binarizer-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!