Something is wrong; all the ROC metric values are missing:

限于喜欢 提交于 2019-12-05 16:40:18

Try to change class variable values from "0","1" to e.g. "A" , "B" and try then.

Looking at the output of summary(credit), I can see that there are NA values in at least two variables;

The variable MonthsEmployed has 5 NA values:

MonthsEmployed 
Min.   :-23.00  
1st Qu.: 26.00  
Median : 68.00 
Mean   : 97.44  
3rd Qu.:139.00  
Max.   :755.00  
NA's   :5  

and the variable InstallmentBalance has 328 NA values.

InstallmentBalance
Min.   :     0  
1st Qu.:  3338       
Median : 14453       
Mean   : 24900       
3rd Qu.: 32238      
Max.   :739371    
NA's   :328     

Try removing the rows with missing values (or temporary remove these two variables) and run the function again to see if this solves your problem.

Also, You need to add metric = "ROC" to the train function and classProbs = TRUE to trainControl() when you use twoClassSummary

ctrl <- trainControl(method = "repeatedcv", 
                     repeats = 3, 
                     classProbs = TRUE,
                     summaryFunction = twoClassSummary) . 

So, your call should be

multinomSummaryFit <- train(LoanStatus~., 
                            data = credit, 
                            method = "multinom", 
                            family=binomial, 
                            metric = "ROC",
                            trControl = ctrl)

Another important issue about your dataset, you need to carefully inspect variables' values and make sure that each value makes sense. For example, the MonthsEmployed variable has negative values. Logically, an employee has a positive number of months employed. Are these negative values wrong or do they mean something else! (for example a value of -23 means the person has not been employed for 23 month).

To answer your question regarding confusionMatrix:

Let's say your trained model is called multinomSummaryFit. In order to evaluate your model on the test dataset, you need to call predict method on the test dataset without LoanStatus (using the same variables you trained your model on), and then compare your model predictions to the actual value in LoanStatus. For example,

#let's say your test datafrme is called test
mymodel_pred <- predict(multinomSummaryFit, test[, names(test) != "LoanStatus"])

then use confusionMatrix:

confusionMatrix(data = mymodel_pred, 
                reference = test$LoanStatus, 
                positive = "Default")

If the test dataset does not have the LoanStatus column then you just use:

mymodel_pred <- predict(multinomSummaryFit, test)

but in this case, you have no way to evaluate your model on the test dataset if you do not know the actual response.

Remember, if you removed any variables from the training dataset, you need to remove them also from the test dataset before you call predict

Splitting the data to train and test using stratified sampling:

trainingRows <- createDataPartition(credit$LoanStatus, p = .70, list= FALSE)
train <- credit[trainingRows, ]
test <- credit[-trainingRows, ]

I got a similar issue on some data where I used the option "summaryFunction = twoClassSummary" to have some output performance metric and some data features had sd() equal to 1.

I solved the problem by keeping out the "twoClassSummary" and computing the performance metrics I required (e.g. ROC, CM) in a following step.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!