What's the best way to replace missing values with NA when reading in a .csv?

会有一股神秘感。 提交于 2019-12-18 11:01:06

问题


I have a .csv dataset with many missing values, and I'd like R to recognize them all the same way (the "correct" way) when I read the table in. I've been using:

import = read.csv("/Users/dataset.csv", 
                  header =T, na.strings=c(""))

This script fills all the empty cells with something, but it's not consistant. When I look at the data with head(import), some missing cells are filled with <NA> and some missing cells are filled with NA. I fear that R treats these two ways of identifying missing values differently when start analyzing the dataset, so I'd like to have the import uniformly read in those missing values.

Finally, some of the missing values in my csv file are represented with a period only. I would also like those periods to be represented by the correct missing value notation when I import to R.


回答1:


The <NA> vs NA just means that some of your columns are character and some are numeric, that's all. Absolutely nothing is wrong with that.

As Ben mentioned above, if some of your missing values in the csv are represented by a single period, ., then you can specify a vector of values that should be treated as NAs via:

na.strings=c("",".","NA")

as an argument to read.csv.




回答2:


You can also use the more flexible readr package, whose equivalent function and argument are read_csv() and na.

library(readr)
read_csv("file.csv", na = c(".", ".."))


来源:https://stackoverflow.com/questions/13822801/whats-the-best-way-to-replace-missing-values-with-na-when-reading-in-a-csv

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!