C5.0 decision tree - c50 code called exit with value 1

前端 未结 6 1170
再見小時候
再見小時候 2020-12-06 17:38

I am getting the following error

c50 code called exit with value 1

I am doing this on the titanic data available from Kaggle

<
相关标签:
6条回答
  • 2020-12-06 18:11

    For anyone interested, the data can be found here: http://www.kaggle.com/c/titanic-gettingStarted/data. I think you need to be registered in order to download it.

    Regarding your problem, first of I think you meant to write

    new_model <- C5.0(train[,-2],train$Survived)
    

    Next, notice the structure of the Cabin and Embarked Columns. These two factors have an empty character as a level name (check with levels(train$Embarked)). This is the point where C50 falls over. If you modify your data such that

    levels(train$Cabin)[1] = "missing"
    levels(train$Embarked)[1] = "missing"
    

    your algorithm will now run without an error.

    0 讨论(0)
  • 2020-12-06 18:14

    Here is what worked finally:-

    Got this idea after reading this post

    library(C50)
    
    test$Survived <- NA
    
    combinedData <- rbind(train,test)
    
    combinedData$Survived <- factor(combinedData$Survived)
    
    # fixing empty character level names 
    levels(combinedData$Cabin)[1] = "missing"
    levels(combinedData$Embarked)[1] = "missing"
    
    new_train <- combinedData[1:891,]
    new_test <- combinedData[892:1309,]
    
    new_model <- C5.0(new_train[,-2],new_train$Survived)
    
    new_model_predict <- predict(new_model,new_test)
    
    submitC50 <- data.frame(PassengerId=new_test$PassengerId, Survived=new_model_predict)
    write.csv(submitC50, file="c50dtree.csv", row.names=FALSE)
    

    The intuition behind this is that in this way both the train and test data set will have consistent factor levels.

    0 讨论(0)
  • 2020-12-06 18:27

    I had the same error, but I was using a numeric dataset without missing values.

    After a long time, I discovered that my dataset had a predictive attribute called "outcome" and the C5.0Control use this name, and this was the error cause :'(

    My solution was changing the column name. Other way, would be create a C5.0Control object and change the value of the label attribute and then pass this object as parameter for the C50 method.

    0 讨论(0)
  • 2020-12-06 18:28

    I also struggled some hours with the same Problem (Return code "1") when building a model as well as when predicting. With the hint of answer of Marco I have written a small function to remove all factor levels equal to "" in a data frame or vector, see code below. However, since R does not allow for pass by reference to functions, you have to use the result of the function (it can not change the original dataframe):

    removeBlankLevelsInDataFrame <- function(dataframe) {
      for (i in 1:ncol(dataframe)) {
        levels <- levels(dataframe[, i])
        if (!is.null(levels) && levels[1] == "") {
          levels(dataframe[,i])[1] = "?"
        }
      }
      dataframe
    }
    
    removeBlankLevelsInVector <- function(vector) {
      levels <- levels(vector)
      if (!is.null(levels) && levels[1] == "") {
        levels(vector)[1] = "?"
      }
      vector
    }
    

    Call of the functions may look like this:

    trainX = removeBlankLevelsInDataFrame(trainX)
    trainY = removeBlankLevelsInVector(trainY)
    model = C50::C5.0.default(trainX,trainY)
    

    However, it seems, that C50 has a similar Problem with character columns containing an empty cell, so you will have probably to extend this to handle also character attributes if you have some.

    0 讨论(0)
  • 2020-12-06 18:30

    Just in case. You can take a look to the error by

    summary(new_model)
    

    Also this error occurs when there are a special characters in the name of a variable. For example, one will get this error if there is "я"(it's from Russian alphabet) character in the name of a variable.

    0 讨论(0)
  • 2020-12-06 18:35

    I also got the same error, but it was because of some illegal characters in the factor levels of one the columns.

    I used make.names function and corrected the factor levels:

    levels(FooData$BarColumn) <- make.names(levels(FooData$BarColumn))
    

    Then the problem was resolved.

    0 讨论(0)
提交回复
热议问题