问题
Hello everyone I am analysing UCI adult census
data. The data has question marks (?
) for every missing value.
I want to replace all the question marks with NA
.
i tried:
library(XML)
census<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",header=F,na.strings="?")
names(census)<-c("Age","Workclass","Fnlwght","Education","EducationNum","MaritalStatus","Occupation"
,"Relationship" , "Race","Gender","CapitalGain","CapitalLoss","HoursPerWeek","NativeCountry","Salary" )
table(census$Workclass)
? Federal-gov Local-gov Never-worked Private Self-emp-inc
1836 960 2093 7 22696 1116
Self-emp-not-inc State-gov Without-pay
2541 1298 14
x
<-ifelse(census$Workclass=="?",NA,census$Workclass)
table(x)
x
1 2 3 4 5 6 7 8 9
1836 960 2093 7 22696 1116 2541 1298 14
but it did not work.
Please help.
回答1:
look at gsub
census$x <- gsub("?",NA,census$x, fixed = TRUE)
edit: forgot to add fixed = TRUE
As Richard pointed out, this will catch all occurrences of a ?
回答2:
Here's an easy way to replace " ?"
with NA
in all columns.
# find elements
idx <- census == " ?"
# replace elements with NA
is.na(census) <- idx
How it works?
The command idx <- census == " ?"
creates a logical matrix with the same numbers of rows and columns as the data frame census
. This matrix idx
contains TRUE
where census
contains " ?"
and FALSE
at the other positions.
The matrix idx
is used as an index. The command is.na(census) <- idx
is used to replace values in census
at the positions in idx
with NA
.
Note that the function is.na<-
is used here. It is not identical with the is.na
function.
来源:https://stackoverflow.com/questions/28061122/how-do-i-remove-question-mark-from-a-data-set-in-r