问题
This is a followup question for this one: how to select/add a column to pandas dataframe based on a function of other columns?
have a data frame and I want to select the rows that match some criteria. The criteria is a function of values of other columns and some additional values.
Here is a toy example:
>> df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
'B': [randint(1,9) for x in xrange(9)],
'C': [4,10,3,5,4,5,3,7,1]})
>>
A B C
0 1 6 4
1 2 8 10
2 3 8 3
3 4 4 5
4 5 2 4
5 6 1 5
6 7 1 3
7 8 2 7
8 9 8 1
I want select all rows for which some non trivial function returns true, e.g. f(a,c,L), where L is a list of lists and f returns True iff a and c are not part of the same sublist. That is, if L = [[1,2,3],[4,2,10],[8,7,5,6,9]] I want to get:
A B C
0 1 6 4
3 4 4 5
4 5 2 4
6 7 1 3
8 9 8 1
Thanks!
回答1:
Here is a VERY VERY hacky and non-elegant solution. As another disclaimer, since your question doesn't state what you want to do if a number in the column is in none of the sub lists this code doesn't handle that in any real way besides any default functionality within isin().
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
'B': [6,8,8,4,2,1,1,2,8],
'C': [4,10,3,5,4,5,3,7,1]})
L = [[1,2,3],[4,2,10],[8,7,5,6,9]]
df['passed1'] = df['A'].isin(L[0])
df['passed2'] = df['C'].isin(L[0])
df['1&2'] = (df['passed1'] ^ df['passed2'])
df['passed4'] = df['A'].isin(L[1])
df['passed5'] = df['C'].isin(L[1])
df['4&5'] = (df['passed4'] ^ df['passed5'])
df['passed7'] = df['A'].isin(L[2])
df['passed8'] = df['C'].isin(L[2])
df['7&8'] = (df['passed7'] ^ df['passed8'])
df['PASSED'] = df['1&2'] & df['4&5'] ^ df['7&8']
del df['passed1'], df['passed2'], df['1&2'], df['passed4'], df['passed5'], df['4&5'], df['passed7'], df['passed8'], df['7&8']
df = df[df['PASSED'] == True]
del df['PASSED']
With an output that looks like:
A B C
0 1 6 4
3 4 4 5
4 5 2 4
6 7 1 3
8 9 8 1
I implemented this rather quickly hence the utter and complete ugliness of this code, but I believe you can refactor it any way you would like (e.g. iterate over the original set of lists with for sub_list in L, improve variable names, come up with a better solution, etc).
Hope this helps. Oh, and did I mention this was hacky and not very good code? Because it is.
来源:https://stackoverflow.com/questions/27912389/how-to-select-add-a-column-to-pandas-dataframe-based-on-a-non-trivial-function-o