Omit NA and data imputation before doing PCA analysis using R

北城余情 提交于 2019-12-04 15:57:53

For na.action to have an effect, you need to explicitly supply a formula argument:

princomp(formula = ~., data = mydf, cor = TRUE, na.action=na.exclude)

# Call:
# princomp(formula = ~., data = mydf, na.action = na.exclude, cor = TRUE)
# 
# Standard deviations:
#    Comp.1    Comp.2    Comp.3 
# 1.3748310 0.8887105 0.5657149 

The formula is needed because it triggers dispatch of princomp.formula, the only princomp method that does anything useful with na.action.

methods('princomp')
[1] princomp.default* princomp.formula*

names(formals(stats:::princomp.formula))
[1] "formula"   "data"      "subset"    "na.action" "..."  

names(formals(stats:::princomp.default))
[1] "x"      "cor"    "scores" "covmat" "subset" "..."   

It's because you used character version of NA which really isn't NA.

This demonstrates what I mean:

is.na("NA")
is.na(NA)

I'd fix it at the creation level but here's a way to retro fix it (because you used the character "NA" it makes the whole column of the class character meaning you'll have to fix that with as.numeric as well):

FUN <- function(x) as.numeric(ifelse(x=="NA", NA, x))
mydf2 <- data.frame(apply(mydf, 2, FUN))
ndnew <- mydf[complete.cases(mydf2),]
ndnew

which yields:

                    A                 B                 C
3    11.3349957691175  6.97143301427903 -2.13578124048775
4    5.69035783905702 -2.44999550936244 -4.40642099309301
5  -0.865878644072023  6.03782080227184  9.83402859382248
6    6.58329959845638  5.67811450593805  12.4477770011262
7   0.759928613563254  16.6445809805028  9.45835418422973
8    11.3798459951171  1.36989010500538 0.784492783538675
9   0.671542080233918   5.9024564388189  16.2389092991422
10   3.64295741533713  9.78754135462621  -2.4293697924212

EDIT:==========================================================

"this works but the defult na.action do not work"

Don't know much about princomp but this works (not sure why the function's na.action doesn't):

out <- princomp(na.omit(mydf), cor = TRUE)

"Is there is any method that can impute the data, as in real data I have almost every column with missing value in them ? result of such na omit will give me ~ 0 rows or columns"

This really is a separate question from your first and you should start a new thread after researching the topic on your own a little bit.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!