问题
Hi I have two DataFrames like below
DF1
Alpha | Numeric | Special
and | 1 | @
or | 2 | $
| 3 | &
| 4 |
| 5 |
and
DF2 with single column
Content |
boy or girl |
school @ morn|
I want to search if anyone of the column in DF1 has anyone of the keyword in content column of DF2 and the output should be in a new DF
output_DF
output_column|
Alpha |
Special |
someone help me with this
回答1:
I have a method that is not very good.
df1 = pd.DataFrame([[['and', 'or'],['1', '2','3','4','5'],['@', '$','&']]],columns=['Alpha','Numeric','Special'])
print(df1)
Alpha Numeric Special
0 [and, or] [1, 2, 3, 4, 5] [@, $, &]
df2 = pd.DataFrame([[['boy', 'or','girl']],[['school', '@','morn']]],columns=['Content'])
print(df2)
Content
0 [boy, or, girl]
1 [school, @, morn]
First, combine the df2 data:
df2list=[x for row in df2['Content'].tolist() for x in row]
print(df2list)
['boy', 'or', 'girl', 'school', '@', 'morn']
Then get data of each column of df1 is intersected with the df2list:
containlistname = []
for i in range(0,df1.shape[1]):
columnsname = df1.columns[i]
df1list=[x for row in df1[columnsname].tolist() for x in row]
intersection = list(set(df1list).intersection(set(df2list)))
if len(intersection)>0:
containlistname.append(columnsname)
output_DF = pd.DataFrame(containlistname,columns=['output_column'])
Final print:
print(output_DF)
output_column
0 Alpha
1 Special
回答2:
You could apply the Series.isin() method for each column in df1 and then return the column names for which there are any occurrences:
import pandas as pd
d = {'Alpha' :['and', 'or'],'Numeric':[1, 2,3,4,5],'Special':['@', '$','&']}
df1 = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.iteritems() ]))
df2 = pd.DataFrame({'Content' :['boy or girl','school @ morn']})
check = lambda r:[c for c in df1.columns if df1[c].dropna().isin(r).any()]
df3 = pd.DataFrame({'output_column' : df2["Content"].str.split(' ').apply(check)})
This results in:
output_column
0 [Alpha]
1 [Special]
来源:https://stackoverflow.com/questions/45055007/searching-if-anyone-of-word-is-present-in-the-another-column-of-a-dataframe-or-i