matching keys from two different dataframes

 ̄綄美尐妖づ 提交于 2019-12-24 00:43:30

问题


I have two dataframes,

df1,
    Name    Stage   Description                                 key
0   Sri      1      Sri is one of the good singer in this two   one
1   NaN      2      Thanks for reading                          two has
2   Ram      1      Ram is two of the good cricket player       three
3   ganesh   1      one driver                                  four
4   NaN      2      good buddies                                NaN


 df2,
    values
    member of four
    one of three friends
    sri is a cricketer
    Rahul has two brothers

I want to replace the df1["key"] with df2 values, if the key is present in df2.values.

I tried, df1["key"]=df2[df2["values"].str.contains("|".join(df2["values"].tolist()),na=False)]

But i am getting the output in the same order,

I want,

    output_df,
        Name    Stage   Description                                 key
0   Sri      1      Sri is one of the good singer in this two   one of three friends
1   NaN      2      Thanks for reading                          Rahul has two brothers
2   Ram      1      Ram is two of the good cricket player       one of three friends
3   ganesh   1      one driver                                  member of four
4   NaN      2      good buddies                                NaN

回答1:


I'll use arrays of sets and use <= for subsetting testing and numpy broadcasting.

setify = lambda x: set(x.split())
v = df2['values'].values.astype(str)
k = df1['key'].values.astype(str)
i = df1.index

# These the sets
a = np.array([setify(x) for x in k.tolist()])
b = np.array([setify(x) for x in v.tolist()])

# This is the broadcasting
matches = (a[:, None] <= b)

# Additional testing that there exist any matches
any_ = matches.any(1)
# Test that wasn't null in the first place
nul_ = df1['key'].notnull().values
mask = any_ & nul_

# And argmax to find where the first set match is.  There
# may be more than one match.  I chose to use `assign`
# therefore I used `mask` to pass a slice of a series
# to target the correct rows.
df1.assign(key1=pd.Series(v[matches.argmax(1)], i)[mask])

     Name  Stage                                Description      key                    key1
0     Sri      1  Sri is one of the good singer in this two      one    one of three friends
1     NaN      2                         Thanks for reading  two has  Rahul has two brothers
2     Ram      1      Ram is two of the good cricket player    three    one of three friends
3  ganesh      1                                 one driver     four          member of four
4     NaN      2                               good buddies      NaN                     NaN


来源:https://stackoverflow.com/questions/46724163/matching-keys-from-two-different-dataframes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!