Logistic Regression on factor: Error in eval(family$initialize) : y values must be 0 <= y <= 1

主宰稳场 提交于 2019-12-26 07:43:32

问题


Not able to fix the below error for the below logistic regression

training=(IBM$Serial<625)
data=IBM[!training,]
dim(data)
stock.direction <- data$Direction
training_model=glm(stock.direction~data$lag2,data=data,family=binomial)
###Error### ----  Error in eval(family$initialize) : y values must be 0 <= y <= 1

Few rows from the data i am using

X   Date    Open    High    Low Close   Adj.Close   Volume  Return  lag1    lag2    lag3    Direction   Serial
1   28-11-2012  190.979996  192.039993  189.270004  191.979996  165.107727  3603600 0.004010855 0.004010855 -0.001198021    -0.006354834    Up  1
2   29-11-2012  192.75  192.899994  190.199997  191.529999  164.720734  4077900 0.00114865  0.00114865  -0.004020279    -0.009502386    Up  2
3   30-11-2012  191.75  192 189.5   190.070007  163.465073  4936400 0.003630178 0.003630178 -0.001894039    -0.005576956    Up  3
4   03-12-2012  190.759995  191.300003  188.360001  189.479996  162.957703  3349600 0.001213907 0.001213907 -0.002480478    -0.001636046    Up  4

回答1:


The reason it's asking for y values between 0 and 1 is because the categorical features in your data such as 'direction' are of type 'character'. You need to convert them to type 'factor' with as.factor(data$Direction). So: glm(Direction ~ lag2, data=...) Don't need to declare stock.direction.

You can check the class of variables by using the command class(variable), and if they're character, you can convert to factor and create a new column in the same data frame. It should work then.




回答2:


Without understanding the data, you should do st like this

library(dplyr)
df <- read.table(header = T, stringsAsFactors = F,  text ="X   Date    Open    High    Low Close   Adj.Close   Volume  Return  lag1    lag2    lag3    Direction   Serial
1   28-11-2012  190.979996  192.039993  189.270004  191.979996  165.107727  3603600 0.004010855 0.004010855 -0.001198021    -0.006354834    Up  1
2   29-11-2012  192.75  192.899994  190.199997  191.529999  164.720734  4077900 0.00114865  0.00114865  -0.004020279    -0.009502386    Up  2
3   30-11-2012  191.75  192 189.5   190.070007  163.465073  4936400 0.003630178 0.003630178 -0.001894039    -0.005576956    Up  3
4   03-12-2012  190.759995  191.300003  188.360001  189.479996  162.957703  3349600 0.001213907 0.001213907 -0.002480478    -0.001636046    Up  4
1   28-11-2012  190.979996  192.039993  189.270004  191.979996  165.107727  3603600 0.004010855 0.004010855 -0.001198021    -0.006354834    Up  1
2   29-11-2012  192.75  192.899994  190.199997  191.529999  164.720734  4077900 0.00114865  0.00114865  -0.004020279    -0.009502386    Down  2
3   30-11-2012  191.75  192 189.5   190.070007  163.465073  4936400 0.003630178 0.003630178 -0.001894039    -0.005576956    Up  3
4   03-12-2012  190.759995  191.300003  188.360001  189.479996  162.957703  3349600 0.001213907 0.001213907 -0.002480478    -0.001636046    Down  4
") %>%
  mutate(bin = ifelse(Direction == "Up", 1, 0))

glm(bin ~ High, family = "binomial", data = df)


来源:https://stackoverflow.com/questions/47546658/logistic-regression-on-factor-error-in-evalfamilyinitialize-y-values-must

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!