问题
I'm running a straightforward linear regression model fit on the following dataframe:
> str(model_data_rev)
'data.frame': 128857 obs. of 12 variables:
$ ENTRY_4 : num 186 218 208 235 256 447 471 191 207 250 ...
$ ENTRY_8 : num 724 769 791 777 707 237 236 726 773 773 ...
$ ENTRY_12: num 2853 2989 3174 3027 3028 ...
$ ENTRY_16: num 2858 3028 3075 2992 3419 ...
$ ENTRY_20: num 7260 7188 7587 7560 7165 ...
$ EXIT_4 : num 70 82 105 114 118 204 202 99 73 95 ...
$ EXIT_8 : num 1501 1631 1594 1576 1536 ...
$ EXIT_12 : num 3862 3923 4158 3970 3895 ...
$ EXIT_16 : num 1559 1539 1737 1681 1795 ...
$ EXIT_20 : num 2145 2310 2217 2330 2291 ...
$ DAY : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 3 4 5 6 7 1 2 3 4 ...
$ MONTH : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 3 3 3 3 3 3 3 3 3 3 ...
I split the data in to training and test sets as follows using the caret package:
split<-createDataPartition(y = model_data_rev$EXIT_20, p = 0.7, list = FALSE)
d_training = model_data_rev[split,]
d_test = model_data_rev[-split,]
I train the model using the train function in the caret package:
ctrl<-trainControl(method = 'cv',number = 5)
lmCVFit<-train(EXIT_20 ~ ., data = d_training, method = 'lm', trControl = ctrl, metric='Rsquared')
summary(lmCVFit)
When I run summary(lmCVFit)
I get the following error:
Error in summary.lm(object$finalModel, ...) :
length of 'dimnames' [1] not equal to array extent
In addition: Warning message:
In cbind(est, se, tval, 2 * pt(abs(tval), rdf, lower.tail = FALSE)) :
number of rows of result is not a multiple of vector length (arg 1)
I thought it might be the related to the my initial dataframe above. Specifically, i thought it could have to do with the factor variables. So I cut them off (not shown), ran everything again, and got the same error.
I also ran the regression without CV using the 'lm' function in R and got the same error when I ran summary()
Has anyone seen this and can anyone help? I can't find anything on line that speaks to this error in the context of regression.
Thanks in advance.
EDIT
I modified the ordinal variable to standard character variables. Structure now looks like this:
> str(model_data_rev)
'data.frame': 128857 obs. of 12 variables:
$ ENTRY_4 : num 186 218 208 235 256 447 471 191 207 250 ...
$ ENTRY_8 : num 724 769 791 777 707 237 236 726 773 773 ...
$ ENTRY_12: num 2853 2989 3174 3027 3028 ...
$ ENTRY_16: num 2858 3028 3075 2992 3419 ...
$ ENTRY_20: num 7260 7188 7587 7560 7165 ...
$ EXIT_4 : num 70 82 105 114 118 204 202 99 73 95 ...
$ EXIT_8 : num 1501 1631 1594 1576 1536 ...
$ EXIT_12 : num 3862 3923 4158 3970 3895 ...
$ EXIT_16 : num 1559 1539 1737 1681 1795 ...
$ EXIT_20 : num 2145 2310 2217 2330 2291 ...
$ DAY : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 4 2 6 7 ...
$ MONTH : Factor w/ 12 levels "April","August",..: 8 8 8 8 8 8 8 8 8 8 ...
I still get the error when running summary after fitting the model.
It is also important emphasize that the model fitting works without throwing an error. It is summary() that is throwing off the error.
Thanks.
来源:https://stackoverflow.com/questions/37201142/length-of-dimnames-1-not-equal-to-array-extent-error-in-linear-regression