randomForest Error: NA not permitted in predictors (but no NAs in data)

你离开我真会死。 提交于 2020-01-16 19:16:08

问题


So I am attempting to run the 'genie3' algorithm (ref: http://homepages.inf.ed.ac.uk/vhuynht/software.html) in R which uses the 'randomForest' method.

I am running into the following Error:

> weight.matrix<-get.weight.matrix(tmpLog2FC, input.idx=1:4551)
Starting RF computations with 1000 trees/target gene,
and 67 candidate input genes/tree node
Computing gene 1/11805
Show Traceback

Rerun with Debug
Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors 

So I checked if NAs are present in my data, and there are none:

> NAs<-sapply(tmpLog2FC, function(x) sum(is.na(x)))
> length(which(NAs!=0))
[1] 0

I then tried editing the specific 'get.weight.matrix()' function to omit NAs (just in case) by changing this line:

rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, ...)

To:

rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, na.action=na.omit)

I then sourced the code, and double checked that it incorporated the changes by calling it on its own (and displaying the actual script):

    }
    target.gene.name <- gene.names[target.gene.idx]
    # remove target gene from input genes
    these.input.gene.names <- setdiff(input.gene.names, target.gene.name)
    x <- expr.matrix[,these.input.gene.names]
    y <- expr.matrix[,target.gene.name]
    rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, na.action=na.omit)

However when attempting to re-run, I get the same error:

Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors 

Has anyone encountered anything similar to this? Any ideas on what I can do?

Thanks in advance.

*EDIT: As suggested, I re-ran with debug:

> weight.matrix<-get.weight.matrix(tmpLog2FC, input.idx=1:4551)
Starting RF computations with 1000 trees/target gene,
and 67 candidate input genes/tree node
Computing gene 1/11805
Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors
Called from: randomForest(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE, 
na.action = na.omit)
Browse[1]> 
> 

The debug shows that the line that I suspected is throwing the error, but it displays it in the edited form with 'na.action=na.omit'. I am even more confused. How can a dataset that has no NAs, run with a code that allows for NAs to be omitted, display this error?


回答1:


You can use the following command to find out the list of rows in which if any predictor will have no value it will be displayed.

data[!complete.cases(data),]

Check that rows carefully, like in my case the rows having no value ",,,,,,,,," (in my file columns predictor variables were comma separated) were showed as NA at the time of RF run.

You can either delete that rows.

Thanks



来源:https://stackoverflow.com/questions/23959810/randomforest-error-na-not-permitted-in-predictors-but-no-nas-in-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!