Python pandas apply function if a column value is not NULL

早过忘川 提交于 2019-12-03 06:32:51

问题


I have a dataframe (in Python 2.7, pandas 0.15.0):

df=
       A    B               C
0    NaN   11             NaN
1    two  NaN  ['foo', 'bar']
2  three   33             NaN

I want to apply a simple function for rows that does not contain NULL values in a specific column. My function is as simple as possible:

def my_func(row):
    print row

And my apply code is the following:

df[['A','B']].apply(lambda x: my_func(x) if(pd.notnull(x[0])) else x, axis = 1)

It works perfectly. If I want to check column 'B' for NULL values the pd.notnull() works perfectly as well. But if I select column 'C' that contains list objects:

df[['A','C']].apply(lambda x: my_func(x) if(pd.notnull(x[1])) else x, axis = 1)

then I get the following error message: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')

Does anybody know why pd.notnull() works only for integer and string columns but not for 'list columns'?

And is there a nicer way to check for NULL values in column 'C' instead of this:

df[['A','C']].apply(lambda x: my_func(x) if(str(x[1]) != 'nan') else x, axis = 1)

Thank you!


回答1:


The problem is that pd.notnull(['foo', 'bar']) operates elementwise and returns array([ True, True], dtype=bool). Your if condition trys to convert that to a boolean, and that's when you get the exception.

To fix it, you could simply wrap the isnull statement with np.all:

df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)

Now you'll see that np.all(pd.notnull(['foo', 'bar'])) is indeed True.




回答2:


Also another way is to just use row.notnull().all() (without numpy), here is an example:

df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)

Here is a complete example on your df:

>>> d = {'A': [None, 2, 3, 4], 'B': [11, None, 33, 4], 'C': [None, ['a','b'], None, 4]}
>>> df = pd.DataFrame(d)
>>> df
     A     B       C
0  NaN  11.0    None
1  2.0   NaN  [a, b]
2  3.0  33.0    None
3  4.0   4.0       4
>>> def func1(r):
...     return 'No'
...
>>> def func2(r):
...     return 'Yes'
...
>>> df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)
0    Yes
1    Yes
2    Yes
3     No

And a friendlier screenshot :-)




回答3:


I had a column contained lists and NaNs. So, the next one worked for me.

df.C.map(lambda x: my_func(x) if type(x) == list else x)



回答4:


Try...

df['a'] = df['a'].apply(lambda x: x.replace(',','\,') if x != None else x)

this example just adds an escape character to a comma if the value is not None



来源:https://stackoverflow.com/questions/26614465/python-pandas-apply-function-if-a-column-value-is-not-null

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!