Find a value of a dictionary in dataframe column and modify it

问题

I dealing with DataFrames and Dictionaries now, and i have a problem, I have a Dictionary "Fruits"

{BN:'Banana', LM:'Lemon', AP:'Apple' ..... etc}

And a DataFrame- "Stock":

   Fruit             Price
0  Sweet Mango           1
1  Green Apple           2
2  Few blue Banana       0
3  Black Banana          5

I wand to do the next thing: replace all the values from Stock['Fruit'] with the Fruits.values() this way: if the value from Fruits appears in the Stock['Fruit'] row it will be replaced this way:

Few blue Banana ---> Banana

Black Banana ---> Banana

now the DataFrame Stock will look this way:

   Fruit             Price
0  Sweet Mango           1
1  Green Apple           2
2  Banana                0
3  Banana                5

I found different codes to replace or to check if values from the Dicitionary appears in the DataFrame

Stock['Fruit'] = Stock.Fruit.map(Fruits)

if (Fruits.values() in Stock['Fruit'] for item in Stock)

any('Mango' in Stock['Fruit'] for index,item in Stock.iterrows())

But i cant find any thing to update the rows of the DataFrame

回答1:

Use string methods for condition and extracting required values,

pat = r'({})'.format('|'.join(d.values()))
cond = df['Fruit'].str.contains('|'.join(d.values()))
df.loc[cond, 'Fruit'] = df['Fruit'].str.extract((pat), expand = False)

    Fruit       Price
0   Sweet Mango 1
1   Apple       2
2   Banana      0
3   Banana      5

Edit: As @user3483203 suggested, you can fill the missing values with original once the pattern is extracted.

df['Fruit'] = df['Fruit'].str.extract(pat).fillna(df.Fruit)

回答2:

IIUC, you can use apply() with a custom function:

import pandas as pd

df = pd.DataFrame([['Sweet Mango', 1],['Green Apple', 2],['Few blue Banana', 0],['Black Banana', 5]],
  columns=['Fruit','Price'])

fruits = {'BN':'Banana', 'LM': 'Lemon', 'AP':'Apple', 'MG': 'Mango'}

def find_category(x):

  return [k for k in fruits.values() if k in x][0]

df['Fruit'] = df['Fruit'].apply(find_category)

Yields:

    Fruit  Price
0   Mango      1
1   Apple      2
2  Banana      0
3  Banana      5

回答3:

Using the results of the answer here, we create a new class that subclasses defaultdict, and override its __missing__ attribute to allow for passing the key to the default_factory:

from collections import defaultdict
class keydefaultdict(defaultdict):
    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key)
        else:
            ret = self[key] = self.default_factory(key)
            return ret

We create an initial dictionary that maps the 2 values in the 'Fruits' column we want to replace.

fruit_dict = {'Few blue Banana': 'Banana', 'Black Banana': 'Banana'}

Then we create a new instance of our class with a default_factory of lambda x: x. I.e., if we don't find the key when we search for it, put the key in as the value.

fruit_col_map = keydefaultdict(lambda x: x)
fruit_col_map.update(**fruit_dict)

Finally, update the column:

df['Fruit'] = df['Fruit'].map(fruit_col_map)
df

Output:

         Fruit  Price
0  Sweet Mango      1
1  Green Apple      2
2       Banana      0
3       Banana      5

Comparing to the accepted answer, this is more than 6 times faster:

df = pd.DataFrame({
    'Fruit': ['Sweet Mango', 'Green Apple', 'Few blue Banana', 'Black Banana']*1000,
    'Price': [1, 2, 0, 5]*1000
})
%timeit df['Fruit'].map(fruit_col_map)

Results:

1.03 ms ± 48.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Accepted answer:

pat = r'({})'.format('|'.join(fruit_dict.values()))
%timeit df['Fruit'].str.extract(pat).fillna(df['Fruit'])

Results:

6.85 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

来源：https://stackoverflow.com/questions/52636092/find-a-value-of-a-dictionary-in-dataframe-column-and-modify-it

标签

python

pandas

dictionary

dataframe

any