Pandas series case-insensitive matching and partial matching between values

◇◆丶佛笑我妖孽 提交于 2020-01-07 04:55:16

问题


I have the following operation to add a status showing where any string in a column of one dataframe column is present in a specified column of another dataframe. It looks like this:

df_one['Status'] = np.where(df_one.A.isin(df_two.A), 'Matched','Unmatched')

This won't match if the string case is different. Is it possible to perform this operation while being case insensitive?

Also, is it possible return 'Matched' when a value in df_one.A ends with the full string from df_two.A? e.g. df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched'


回答1:


You can do the first test by converting both strings to lowercase or uppercase (either works) inside the expression (as you aren't reassigning either column back to your DataFrames, the case conversion is only temporary):

df_one['Status'] = np.where(df_one.A.str.lower().isin(df_two.A.str.lower()), \ 
                            'Matched', 'Unmatched')

You can perform your second test by checking whether each string in df_one.A ends with any of the strings in df_two.A, like so (assuming you still want a case-insensitive match):

df_one['Endswith_Status'] = np.where(df_one.A.str.lower().apply( \
                                      lambda x: any(x.endswith(i) for i in df_two.A.str.lower())), \ 
                                      'Matched', 'Unmatched')


来源:https://stackoverflow.com/questions/44979927/pandas-series-case-insensitive-matching-and-partial-matching-between-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!