Pandas: map column using a dictionary on multiple columns

南楼画角 提交于 2020-08-10 17:42:07

问题


I have a dataframe with None values in one column. I would like to replace this None values with the maximum value of the "category" for the same combination of other columns.

Example: pandas dataframe

import pandas as pd
d = {'company': ['Company1', 'Company1', 'Company1', 'Company1', 'Company2', 'Company2'], 'product': ['Product A', 'Product A', 'Product F', 'Product A', 'Product F', 'Product F'], 'category': ['1', None, '3', '2', None, '5']}

df = pd.DataFrame(d)

   company   product       category
0  Company1  Product A        1
1  Company1  Product A     None
2  Company1  Product F        3
3  Company1  Product A        2
4  Company2  Product F     None
5  Company2  Product F        5

I want to replace the None values in 3. column with the max() value of this column for the unique combination (group of 1. + 2. column). The expected result would look like this:

   company   product       category
0  Company1  Product A        1
1  Company1  Product A        **2**
2  Company1  Product F        3
3  Company1  Product A        2
4  Company2  Product F        **5**
5  Company2  Product F        5

What I've tried: I've grouped the 1. + 2. column ("company + product") + get the max() of the 3. column + build a dictionary 'category'. (based on the idea of GroupBy results to dictionary of lists)

df_dict = df[~df['category'].isna()].groupby(['company','product'])['category'].max().apply(list).to_dict()

and I get this dict df_dict (shows the maximum category value for each combination):

{('Company1', 'Product A'): ['2'], ('Company1', 'Product F'): ['1'], ('Company2', 'Product F'): ['5']}

Now I want to replace None with the max values for each combination from my dcitionary

df[df['category'].isna()]

   ompany    product       category
1  Company1  Product A     None
4  Company2  Product F     None

The question, how can I do it? I tried with set_index()

df[df['category'].isna()].set_index(['company', 'product']).index

results in

MultiIndex([('Company1', 'Product A'),
        ('Company2', 'Product F')],
       names=['company', 'product'])

which would fit the entry in the dictionary and use it together map()

    df['category'] = df[df['category'].isna()].set_index(['company', 'product']).in
dex.map(df_dict)

... but I get an ValueError

ValueError: Length of values (2) does not match length of index (6)

回答1:


two steps,

first lets turn those None values to NaNs so we can use numeric operations.

df['category'] = pd.to_numeric(df['category']) # add arg ,errors='coerce' if needed.

2nd, lets use groupby transform and max to fill only the NaN values.

df["category"] = df["category"].fillna(
    df.groupby(["company", "product"])["category"].transform("max")
)
print(df)

    company    product  category
0  Company1  Product A       1.0
1  Company1  Product A       2.0
2  Company1  Product F       3.0
3  Company1  Product A       2.0
4  Company2  Product F       5.0
5  Company2  Product F       5.0


来源:https://stackoverflow.com/questions/63280810/pandas-map-column-using-a-dictionary-on-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!