How to match and merge two dataframes having completely different values except numericals in columns of dataframe?

问题

have a dataframe ABC of value

      id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you       | unknown
1     vbbngy     | txn of INR 191.00 using            | unknown
2     awerfa     | Rs.190.78 credits was used by you  | unknown
3     zxcmo5     | DLR.2000 credits was used by you   | unknown

and other XYZ of value

         price          |   type
0      190.78           | food
1      191.00           | movie
2      2,000            | football
3      1,599.00         | basketball

how to map XYZ with ABC ,so that type in ABC get updated with type in xyz using values(numericals) in price of XYZ .

output i need

       id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you        | basketball
1     vbbngy     | txn of INR 191.00 using             | movie
2     awerfa     | Rs.190.78 credits was used by you   | food
3     zxcmo5     | DLR.2,000 credits was used by you| football

used this

d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))

pat = (r'({})'.format('|'.join(d.keys())))

ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)

But values likes 190.78 and 191.00 are getting mismatched. for example while working with huge data 190.78 should be matched with food values like 190.77 gets mismatched with food where it has other value assigned to it. And 198.78 also gets mismatched with some other ones where it should match with food

回答1:

        id                price                                type
0       easdca        Rs.1,599.00 was trasn by you          unknown
1       vbbngy        txn of INR 191.00 using               unknown
2       awerfa        Rs.190.78 credits was used by you     unknown
3       zxcmo5        DLR.2000 credits was used by you      unknown

df2

           price                   type
0        190.78                    food
1        191.00                   movie
2        2,000                 football
3        1,599.00            basketball

using re

df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')

Changing the types to float

df2['price_']  = df2['price_'].astype(float)
df['price_']  = df['price_'] .astype(float)

Using pd.merge

df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)

Output

                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football

回答2:

You can do the following:

'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''

df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)

# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')

Output

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football

I assumed you made a typo in your xyz table, the third price should be 2000.78 and not 2000.

来源：https://stackoverflow.com/questions/54902811/how-to-match-and-merge-two-dataframes-having-completely-different-values-except

标签

python

python-3.x

pandas

dataframe

epoch