Exception Handling in Pandas .apply() function

问题

If I have a DataFrame:

myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

Gives the following dataframe (Starting out on stackoverflow and don't have enough reputation for an image of the DataFrame)

   | A  | B  |

0  | 11 | 11 |

1  | 22 | 2A |

2  | 33 | 33 |

If i want to convert column B to int values and drop values that can't be converted I have to do:

def convertToInt(cell):
    try:
        return int(cell)
    except:
        return None
myDF['B'] = myDF['B'].apply(convertToInt)

If I only do:

myDF['B'].apply(int)

the error obviously is:

C:\WinPython-32bit-2.7.5.3\python-2.7.5\lib\site-packages\pandas\lib.pyd in pandas.lib.map_infer (pandas\lib.c:42840)()

ValueError: invalid literal for int() with base 10: '2A'

Is there a way to add exception handling to myDF['B'].apply()

Thank you in advance!

回答1:

much better/faster to do:

In [1]: myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

In [2]: myDF.convert_objects(convert_numeric=True)
Out[2]: 
    A   B
0  11  11
1  22 NaN
2  33  33

[3 rows x 2 columns]

In [3]: myDF.convert_objects(convert_numeric=True).dtypes
Out[3]: 
A      int64
B    float64
dtype: object

This is a vectorized method of doing just this. The coerce flag say to mark as nan anything that cannot be converted to numeric.

You can of course do this to a single column if you'd like.

回答2:

I had the same question, but for a more general case where it was hard to tell if the function would generate an exception (i.e. you couldn't explicitly check this condition with something as straightforward as isdigit).

After thinking about it for a while, I came up with the solution of embedding the try/except syntax in a separate function. I'm posting a toy example in case it helps anyone.

import pandas as pd
import numpy as np

x=pd.DataFrame(np.array([['a','a'], [1,2]]))

def augment(x):
    try:
        return int(x)+1
    except:
        return 'error:' + str(x)

x[0].apply(lambda x: augment(x))

回答3:

A way to achieve that with lambda:

myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)

For your input:

>>> myDF
    A   B
0  11  11
1  22  2A
2  33  33

[3 rows x 2 columns]

>>> myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)
0    11
1   NaN
2    33
Name: B, dtype: float64

来源：https://stackoverflow.com/questions/22847304/exception-handling-in-pandas-apply-function

标签

python

exception-handling

pandas