问题
I am trying to write a Pandas dataframe (or can use a numpy array) to a mysql database using MysqlDB . MysqlDB doesn't seem understand 'nan' and my database throws out an error saying nan is not in the field list. I need to find a way to convert the 'nan' into a NoneType.
Any ideas?
回答1:
@bogatron has it right, you can use where, it's worth noting that you can do this natively in pandas:
df1 = df.where((pd.notnull(df)), None)
Note: this changes the dtype of all columns to object
.
Example:
In [1]: df = pd.DataFrame([1, np.nan])
In [2]: df
Out[2]:
0
0 1
1 NaN
In [3]: df1 = df.where((pd.notnull(df)), None)
In [4]: df1
Out[4]:
0
0 1
1 None
Note: what you cannot do recast the DataFrames dtype
to allow all datatypes types, using astype, and then the DataFrame fillna method:
df1 = df.astype(object).replace(np.nan, 'None')
Unfortunately neither this, nor using replace, works with None
see this (closed) issue.
As an aside, it's worth noting that for most use cases you don't need to replace NaN with None, see this question about the difference between NaN and None in pandas.
However, in this specific case it seems you do (at least at the time of this answer).
回答2:
df = df.replace({pd.np.nan: None})
Credit goes to this guy here on Github issue.
回答3:
You can replace nan
with None
in your numpy array:
>>> x = np.array([1, np.nan, 3])
>>> y = np.where(np.isnan(x), None, x)
>>> print y
[1.0 None 3.0]
>>> print type(y[1])
<type 'NoneType'>
回答4:
After stumbling around, this worked for me:
df = df.astype(object).where(pd.notnull(df),None)
回答5:
Quite old, yet I stumbled upon the very same issue. Try doing this:
df['col_replaced'] = df['col_with_npnans'].apply(lambda x: None if np.isnan(x) else x)
回答6:
Just an addition to @Andy Hayden's answer:
Since DataFrame.mask is the opposite twin of DataFrame.where, they have the exactly same signature but with opposite meaning:
DataFrame.where
is useful for Replacing values where the condition is False.DataFrame.mask
is used for Replacing values where the condition is True.
So in this question, using df.mask(df.isna(), other=None, inplace=True)
might be more intuitive.
回答7:
Another addition: be careful when replacing multiples and converting the type of the column back from object to float. If you want to be certain that your None
's won't flip back to np.NaN
's apply @andy-hayden's suggestion with using pd.where
.
Illustration of how replace can still go 'wrong':
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({"a": [1, np.NAN, np.inf]})
In [4]: df
Out[4]:
a
0 1.0
1 NaN
2 inf
In [5]: df.replace({np.NAN: None})
Out[5]:
a
0 1
1 None
2 inf
In [6]: df.replace({np.NAN: None, np.inf: None})
Out[6]:
a
0 1.0
1 NaN
2 NaN
In [7]: df.where((pd.notnull(df)), None).replace({np.inf: None})
Out[7]:
a
0 1.0
1 NaN
2 NaN
来源:https://stackoverflow.com/questions/14162723/replacing-pandas-or-numpy-nan-with-a-none-to-use-with-mysqldb