I am working with a df and using numpy to transform data - including setting blanks (or \'\') to NaN. But when I write the df to csv - the output contains the string \'nan\' as
In my situation, the culprit was np.where. When the data types of the two return elements are different, then your np.NaN will be converted to a nan.
It's hard (for me) to see exactly what's going on under the hood, but I suspect this might be true for other Numpy array methods that have mixed types.
A minimal example:
import numpy as np
import pandas as pd
seq = [1, 2, 3, 4, np.NaN]
same_type_seq = np.where("parrot"=="dead", 0, seq)
diff_type_seq = np.where("parrot"=="dead", "spam", seq)
pd.Series(seq).to_csv("vanilla_nan.csv", header=False) # as expected, last row is blank
pd.Series(same_type_seq).to_csv("samey_nan.csv", header=False) # also, blank
pd.Series(diff_type_seq).to_csv("nany_nan.csv", header=False) # nan instead of blank
So how to get round this? I'm not too sure, but as a hacky workaround for small datasets, you can replace NaN in your original sequence with a token string and then replace it back to np.NaN
repl = "missing"
hacky_seq = np.where("parrot"=="dead", "spam", [repl if np.isnan(x) else x for x in seq])
pd.Series(hacky_seq).replace({repl:np.NaN}).to_csv("hacky_nan.csv", header=False)