问题
I have a data set consisting of a dichotomous depending variable (Y) and 12 independent variables (X1 to X12) stored in a csv file. Here are the first 5 rows of the data:
Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12
0,9,3.86,111,126,14,13,1,7,7,0,M,46-50
1,7074,3.88,232,4654,143,349,2,27,18,6,M,25-30
1,5120,27.45,97,2924,298,324,3,56,21,0,M,31-35
1,18656,79.32,408,1648,303,8730,286,294,62,28,M,25-30
0,3869,21.23,260,2164,550,320,3,42,203,3,F,18-24
I constructed a logistic regression model from the data using the following code:
mydata <- read.csv("data.csv")
mylogit <- glm(Y~X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12, data=mydata,
family="binomial")
mysteps <- step(mylogit, Y~X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12, data=mydata,
family="binomial")
I can obtain the predicted probabilities for each data using the code:
theProbs <- fitted(mysteps)
Now, I would like to create a classification table--using the first 20 rows of the data table (mydata)--from which I can determine the percentage of the predicted probabilities that actually agree with the data. Note that for the dependent variable (Y), 0 represents probability that is less than 0.5, and 1 represents probability that is greater than 0.5.
I have spent many hour trying to construct the classification without success. I would appreciate it very much if someone suggest code that can help to solve this problem.
回答1:
Question is a bit old, but I figure if someone is looking though the archives, this may help. This is easily done by xtabs
classDF <- data.frame(response = mydata$Y, predicted = round(fitted(mysteps),0))
xtabs(~ predicted + response, data = classDF)
which will produce a table like this:
response
predicted 0 1
0 339 126
1 130 394
回答2:
I think 'round' can do the job here.
table(round(theProbs))
来源:https://stackoverflow.com/questions/13661025/classification-table-for-logistic-regression-in-r