问题
I have a pandas dataframe in which one of the column contains user information. Each record of this column is a list which in turn contains dictionaries of user information. Like the follwoing:
USER id
1 [{u'STATUS': u'INACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}] 634618
2 [{u'STATUS': u'INACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}] 642054
3 [{u'STATUS': u'ACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}] 631426
I want to select only the rows where the STATUS is ACTV and the NAME is abc. How do I select rows where the data is nested. So in the above df only row 3 will be selected
回答1:
You could loop through the USER column with apply
and then check if any of the dictionary satisfy the condition which makes a boolean series for subsetting:
df[df.USER.apply(lambda lst: any(d['NAME']=='abc' and d['STATUS']=='ACTV' for d in lst))]
# USER id
#3 [{'STATUS': 'ACTV', 'NAME': 'abc'}, {'STATUS':... 631426
回答2:
We can unpack your df.USER
column into a pd.Panel
and find the rows that way. Lots of overhead. Not worth it! But cool... maybe. I'll try again later.
pn = pd.Panel({k: pd.DataFrame(v) for k, v in df.USER.iteritems()})
cond1 = pn.loc[:, :, 'STATUS'] == 'ACTV'
cond2 = pn.loc[:, :, 'NAME'] == 'abc'
df.loc[pn.loc[(cond1 & cond2).any(), :, :].items]
USER id
2 [{'STATUS': 'ACTV', 'NAME': 'abc'}, {'STATUS':... 631426
回答3:
You can filter your dataframe via string comparisons:
df[(df['USER'].str.contains('\'STATUS\': u\'ACTV\'')) & (df['USER'].str.contains('\'NAME\': u\'abc\''))]
来源:https://stackoverflow.com/questions/43504945/pandas-select-rows-based-on-multiple-object-values-inside-a-column