Pandas Changing the format of NaN values when saving to CSV

 ̄綄美尐妖づ 提交于 2020-01-03 08:41:18

问题


I am working with a df and using numpy to transform data - including setting blanks (or '') to NaN. But when I write the df to csv - the output contains the string 'nan' as oppose to being NULL.

I have looked around but can't find a workable solution. Here's the basic issue:

df
index x    y   z
0     1   NaN  2
1     NaN  3   4

CSV output:

index x    y   z
0     1   nan  2
1     nan  3   4

I have tried a few things to set 'nan' to NULL but the csv output results in a 'blank' rather than NULL:

dfDemographics = dfDemographics.replace('nan', np.NaN)
dfDemographics.replace(r'\s+( +\.)|#', np.nan, regex=True).replace('', 
np.nan)
dfDemographics = dfDemographics.replace('nan', '')  # of course, this wouldn't work, but tried it anyway.

Any help would be appreciated.


回答1:


Pandas to the rescue, use na_rep to fix your own representation for NaNs.

df.to_csv('file.csv', na_rep='NULL')

file.csv

,index,x,y,z
0,0,1.0,NULL,2
1,1,NULL,3.0,4



回答2:


Using df.replace may help -

df = df.replace(np.nan, '', regex=True)
df.to_csv("df.csv", index=False)

(This sets all the null values to '' i.e empty string.)




回答3:


User @coldspeed illustrates how to replace nan values with NULL when save pd.DataFrame. In case, for data analysis, one is interested in replacing the "NULL" values in pd.DataFrame with np.NaN values, the following code will do:

import numpy as np, pandas as pd

# replace NULL values with np.nan
colNames = mydf.columns.tolist()
dfVals = mydf.values
matSyb = mydf.isnull().values
dfVals[matSyb] = np.NAN

mydf = pd.DataFrame(dfVals, columns=colNames)    
#np.nansum(mydf.values, axis=0 )
#np.nansum(dfVals, axis=0 )


来源:https://stackoverflow.com/questions/50890989/pandas-changing-the-format-of-nan-values-when-saving-to-csv

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!