问题
have a dataframe ABC of value
id | price | type
0 easdca | Rs.1,599.00 was trasn by you | unknown
1 vbbngy | txn of INR 191.00 using | unknown
2 awerfa | Rs.190.78 credits was used by you | unknown
3 zxcmo5 | DLR.2000 credits was used by you | unknown
and other XYZ of value
price | type
0 190.78 | food
1 191.00 | movie
2 2,000 | football
3 1,599.00 | basketball
how to map XYZ with ABC ,so that type in ABC get updated with type in xyz using values(numericals) in price of XYZ .
output i need
id | price | type
0 easdca | Rs.1,599.00 was trasn by you | basketball
1 vbbngy | txn of INR 191.00 using | movie
2 awerfa | Rs.190.78 credits was used by you | food
3 zxcmo5 | DLR.2,000 credits was used by you| football
used this
d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))
pat = (r'({})'.format('|'.join(d.keys())))
ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)
But values likes 190.78 and 191.00 are getting mismatched. for example while working with huge data 190.78 should be matched with food values like 190.77 gets mismatched with food where it has other value assigned to it. And 198.78 also gets mismatched with some other ones where it should match with food
回答1:
df
id price type
0 easdca Rs.1,599.00 was trasn by you unknown
1 vbbngy txn of INR 191.00 using unknown
2 awerfa Rs.190.78 credits was used by you unknown
3 zxcmo5 DLR.2000 credits was used by you unknown
df2
price type
0 190.78 food
1 191.00 movie
2 2,000 football
3 1,599.00 basketball
using re
df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')
Changing the types to float
df2['price_'] = df2['price_'].astype(float)
df['price_'] = df['price_'] .astype(float)
Using pd.merge
df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)
Output
id price price_ type_y
0 easdca Rs.1,599.00 was trasn by you 1599.00 basketball
1 vbbngy txn of INR 191.00 using 191.00 movie
2 awerfa Rs.190.78 credits was used by you 190.78 food
3 zxcmo5 DLR.2000 credits was used by you 2000 football
回答2:
You can do the following:
'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''
df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)
# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')
Output
id price type_2
0 easdca Rs.1,599.00 was trasn by you basketball
1 vbbngy txn of INR 191.00 using movie
2 awerfa Rs.190.78 credits was used by you food
3 zxcmo5 DLR.2000.78 credits was used by you football
I assumed you made a typo in your xyz
table, the third price should be 2000.78
and not 2000
.
来源:https://stackoverflow.com/questions/54902811/how-to-match-and-merge-two-dataframes-having-completely-different-values-except