Parsing “NA” entries as NaN values when reading in a pandas dataframe

为君一笑 提交于 2019-12-17 21:22:32

问题


i am new to pandas. I have loaded csv using pandas.read_csv. i have tried not to specify dtype but it was way too slow. since it is a very large file, i also specified data type. however, sometimes in numeric columns, it contains "NA". i have used na_values = ['NA'], will it affect my data frame? i still want to preserve these rows. my question is if i specify data type and add na_values = ['NA'], will NA be tossed away? if yes, how can i maintain similar process time without losing these na? thank you very much!


回答1:


From the pd.read_csv docs:

na_values : scalar, str, list-like, or dict, default None

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ... ‘NA’, ...`.

Bold emphasis mine. These values are not tossed away, rather, they are converted to NaN. Pandas is smart enough to automatically recognise those values without you explicitly stating it.



来源:https://stackoverflow.com/questions/45971039/parsing-na-entries-as-nan-values-when-reading-in-a-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!