问题
I am working with a df and using numpy to transform data - including setting blanks (or '') to NaN. But when I write the df to csv - the output contains the string 'nan' as oppose to being NULL.
I have looked around but can't find a workable solution. Here's the basic issue:
df
index x y z
0 1 NaN 2
1 NaN 3 4
CSV output:
index x y z
0 1 nan 2
1 nan 3 4
I have tried a few things to set 'nan' to NULL but the csv output results in a 'blank' rather than NULL:
dfDemographics = dfDemographics.replace('nan', np.NaN)
dfDemographics.replace(r'\s+( +\.)|#', np.nan, regex=True).replace('',
np.nan)
dfDemographics = dfDemographics.replace('nan', '') # of course, this wouldn't work, but tried it anyway.
Any help would be appreciated.
回答1:
Pandas to the rescue, use na_rep
to fix your own representation for NaNs.
df.to_csv('file.csv', na_rep='NULL')
file.csv
,index,x,y,z
0,0,1.0,NULL,2
1,1,NULL,3.0,4
回答2:
Using df.replace may help -
df = df.replace(np.nan, '', regex=True)
df.to_csv("df.csv", index=False)
(This sets all the null values to '' i.e empty string.)
回答3:
User @coldspeed illustrates how to replace nan values with NULL when save pd.DataFrame. In case, for data analysis, one is interested in replacing the "NULL" values in pd.DataFrame with np.NaN values, the following code will do:
import numpy as np, pandas as pd
# replace NULL values with np.nan
colNames = mydf.columns.tolist()
dfVals = mydf.values
matSyb = mydf.isnull().values
dfVals[matSyb] = np.NAN
mydf = pd.DataFrame(dfVals, columns=colNames)
#np.nansum(mydf.values, axis=0 )
#np.nansum(dfVals, axis=0 )
来源:https://stackoverflow.com/questions/50890989/pandas-changing-the-format-of-nan-values-when-saving-to-csv