问题
I dealing with DataFrames and Dictionaries now, and i have a problem, I have a Dictionary "Fruits"
{BN:'Banana', LM:'Lemon', AP:'Apple' ..... etc}
And a DataFrame- "Stock":
Fruit Price
0 Sweet Mango 1
1 Green Apple 2
2 Few blue Banana 0
3 Black Banana 5
I wand to do the next thing:
replace all the values from Stock['Fruit']
with the Fruits.values()
this way:
if the value from Fruits appears in the Stock['Fruit']
row it will be replaced this way:
Few blue Banana ---> Banana
Black Banana ---> Banana
now the DataFrame Stock will look this way:
Fruit Price
0 Sweet Mango 1
1 Green Apple 2
2 Banana 0
3 Banana 5
I found different codes to replace or to check if values from the Dicitionary appears in the DataFrame
Stock['Fruit'] = Stock.Fruit.map(Fruits)
if (Fruits.values() in Stock['Fruit'] for item in Stock)
any('Mango' in Stock['Fruit'] for index,item in Stock.iterrows())
But i cant find any thing to update the rows of the DataFrame
回答1:
Use string methods for condition and extracting required values,
pat = r'({})'.format('|'.join(d.values()))
cond = df['Fruit'].str.contains('|'.join(d.values()))
df.loc[cond, 'Fruit'] = df['Fruit'].str.extract((pat), expand = False)
Fruit Price
0 Sweet Mango 1
1 Apple 2
2 Banana 0
3 Banana 5
Edit: As @user3483203 suggested, you can fill the missing values with original once the pattern is extracted.
df['Fruit'] = df['Fruit'].str.extract(pat).fillna(df.Fruit)
回答2:
IIUC, you can use apply()
with a custom function:
import pandas as pd
df = pd.DataFrame([['Sweet Mango', 1],['Green Apple', 2],['Few blue Banana', 0],['Black Banana', 5]],
columns=['Fruit','Price'])
fruits = {'BN':'Banana', 'LM': 'Lemon', 'AP':'Apple', 'MG': 'Mango'}
def find_category(x):
return [k for k in fruits.values() if k in x][0]
df['Fruit'] = df['Fruit'].apply(find_category)
Yields:
Fruit Price
0 Mango 1
1 Apple 2
2 Banana 0
3 Banana 5
回答3:
Using the results of the answer here, we create a new class that subclasses defaultdict
, and override its __missing__
attribute to allow for passing the key to the default_factory
:
from collections import defaultdict
class keydefaultdict(defaultdict):
def __missing__(self, key):
if self.default_factory is None:
raise KeyError(key)
else:
ret = self[key] = self.default_factory(key)
return ret
We create an initial dictionary that maps the 2 values in the 'Fruits'
column we want to replace.
fruit_dict = {'Few blue Banana': 'Banana', 'Black Banana': 'Banana'}
Then we create a new instance of our class with a default_factory
of lambda x: x
. I.e., if we don't find the key when we search for it, put the key in as the value.
fruit_col_map = keydefaultdict(lambda x: x)
fruit_col_map.update(**fruit_dict)
Finally, update the column:
df['Fruit'] = df['Fruit'].map(fruit_col_map)
df
Output:
Fruit Price
0 Sweet Mango 1
1 Green Apple 2
2 Banana 0
3 Banana 5
Comparing to the accepted answer, this is more than 6 times faster:
df = pd.DataFrame({
'Fruit': ['Sweet Mango', 'Green Apple', 'Few blue Banana', 'Black Banana']*1000,
'Price': [1, 2, 0, 5]*1000
})
%timeit df['Fruit'].map(fruit_col_map)
Results:
1.03 ms ± 48.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Accepted answer:
pat = r'({})'.format('|'.join(fruit_dict.values()))
%timeit df['Fruit'].str.extract(pat).fillna(df['Fruit'])
Results:
6.85 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
来源:https://stackoverflow.com/questions/52636092/find-a-value-of-a-dictionary-in-dataframe-column-and-modify-it