how to select/add a column to pandas dataframe based on a non trivial function of other columns

眉间皱痕 提交于 2019-12-12 02:05:02

问题


This is a followup question for this one: how to select/add a column to pandas dataframe based on a function of other columns?

have a data frame and I want to select the rows that match some criteria. The criteria is a function of values of other columns and some additional values.

Here is a toy example:

>> df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
               'B': [randint(1,9) for x in xrange(9)],
               'C': [4,10,3,5,4,5,3,7,1]})
>>
   A  B   C
0  1  6   4
1  2  8  10
2  3  8   3
3  4  4   5
4  5  2   4
5  6  1   5
6  7  1   3
7  8  2   7
8  9  8   1

I want select all rows for which some non trivial function returns true, e.g. f(a,c,L), where L is a list of lists and f returns True iff a and c are not part of the same sublist. That is, if L = [[1,2,3],[4,2,10],[8,7,5,6,9]] I want to get:

   A  B   C
0  1  6   4
3  4  4   5
4  5  2   4
6  7  1   3
8  9  8   1

Thanks!


回答1:


Here is a VERY VERY hacky and non-elegant solution. As another disclaimer, since your question doesn't state what you want to do if a number in the column is in none of the sub lists this code doesn't handle that in any real way besides any default functionality within isin().

import pandas as pd

df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
               'B': [6,8,8,4,2,1,1,2,8],
               'C': [4,10,3,5,4,5,3,7,1]})

L = [[1,2,3],[4,2,10],[8,7,5,6,9]]


df['passed1'] = df['A'].isin(L[0])
df['passed2'] = df['C'].isin(L[0])
df['1&2'] = (df['passed1'] ^ df['passed2'])

df['passed4'] = df['A'].isin(L[1])
df['passed5'] = df['C'].isin(L[1])
df['4&5'] = (df['passed4'] ^ df['passed5'])

df['passed7'] = df['A'].isin(L[2])
df['passed8'] = df['C'].isin(L[2])
df['7&8'] = (df['passed7'] ^ df['passed8'])

df['PASSED'] = df['1&2'] & df['4&5'] ^ df['7&8'] 

del df['passed1'],  df['passed2'], df['1&2'], df['passed4'], df['passed5'], df['4&5'], df['passed7'], df['passed8'], df['7&8']
df = df[df['PASSED'] == True]
del df['PASSED']

With an output that looks like:

    A   B   C
0   1   6   4
3   4   4   5
4   5   2   4
6   7   1   3
8   9   8   1

I implemented this rather quickly hence the utter and complete ugliness of this code, but I believe you can refactor it any way you would like (e.g. iterate over the original set of lists with for sub_list in L, improve variable names, come up with a better solution, etc).

Hope this helps. Oh, and did I mention this was hacky and not very good code? Because it is.



来源:https://stackoverflow.com/questions/27912389/how-to-select-add-a-column-to-pandas-dataframe-based-on-a-non-trivial-function-o

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!