glmnet | 易学教程

cv.glmnet fails for ridge, not lasso, for simulated data with coder error

阅读更多关于 cv.glmnet fails for ridge, not lasso, for simulated data with coder error

Gist The error: Error in predmat[which, seq(nlami)] = preds : replacement has length zero The context: data is simulated with a binary y, but there are n coders of true y . the data is stacked n times and a model is fitted, trying to get true y . The error is received for L2 penalty, but not L1 penalty. when Y is the coder Y, but not when it is the true Y. the error is not deterministic, but depends on seed. UPDATE: the error is for versions after 1.9-8. 1.9-8 does not fail. Reproduction base data: library(glmnet) rm(list=ls()) set.seed(123) num_obs=4000 n_coders=2 precision=.8 X <- matrix

Ridge-regression model: glmnet

阅读更多关于 Ridge-regression model: glmnet

Fitting a linear-regression model using least squares on my training dataset works fine. library(Matrix) library(tm) library(glmnet) library(e1071) library(SparseM) library(ggplot2) trainingData <- read.csv("train.csv", stringsAsFactors=FALSE,sep=",", header = FALSE) testingData <- read.csv("test.csv",sep=",", stringsAsFactors=FALSE, header = FALSE) lm.fit = lm(as.factor(V42)~ ., data = trainingData) linearMPrediction = predict(lm.fit,newdata = testingData, se.fit = TRUE) mean((linearMPrediction$fit - testingData[,20:41])^2) linearMPrediction$residual.scale However, when i try to fit a ridge

Glmnet is different with intercept=TRUE compared to intercept=FALSE and with penalty.factor=0 for an intercept in x

阅读更多关于 Glmnet is different with intercept=TRUE compared to intercept=FALSE and with penalty.factor=0 for an intercept in x

I am new to glmnet and playing with the penalty.factor option. The vignette says that it "Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model." And the longer PDF document has code. So I expected that running a regression with intercept = TRUE and no constant in x would be the same as with intercept = FALSE and a constant in x with penalty.factor = 0 . But the code below shows that it is not: the latter case has an intercept of 0 and the other two coefficients are 20% larger than in the former. library("glmnet") set.seed(7) # penalty for

R error in glmnet: NA/NaN/Inf in foreign function call

阅读更多关于 R error in glmnet: NA/NaN/Inf in foreign function call

问题 I am trying to create a model using glmnet, (currently using cv to find the lambda value) and I am getting an error NA/NaN/Inf in foreign function call (arg 5) . I believe this has something to do with the NA values in my data set, because when I remove all data points with NAs the command runs successfully. I was under the impression that glmnet can handle NA values. I'm not sure where the error is coming from: > res <- cv.glmnet(features.mat, as.factor(tmp[,"outcome"]), family="binomial")

Is cv.glmnet overfitting the the data by using the full lambda sequence?

阅读更多关于 Is cv.glmnet overfitting the the data by using the full lambda sequence?

cv.glmnet has been used by most research papers and companies. While building a similar function like cv.glmnet for glmnet.cr (a similar package that implements the lasso for continuation ratio ordinal regression) I came across this problem in cv.glmnet . `cv.glmnet` first fits the model: glmnet.object = glmnet(x, y, weights = weights, offset = offset, lambda = lambda, ...) After the glmnet object is created with the complete data, the next step goes as follows: The lambda from the complete model fitted is extracted lambda = glmnet.object$lambda Now they make sure number of folds is more than

Difference between glmnet() and cv.glmnet() in R?

阅读更多关于 Difference between glmnet() and cv.glmnet() in R?

问题 I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda=

How does glmnet compute the maximal lambda value?

阅读更多关于 How does glmnet compute the maximal lambda value?

问题 The glmnet package uses a range of LASSO tuning parameters lambda scaled from the maximal lambda_max under which no predictors are selected. I want to find out how glmnet computes this lambda_max value. For example, in a trivial dataset: set.seed(1) library("glmnet") x <- matrix(rnorm(100*20),100,20) y <- rnorm(100) fitGLM <- glmnet(x,y) max(fitGLM$lambda) # 0.1975946 The package vignette (http://www.jstatsoft.org/v33/i01/paper) describes in section 2.5 that it computes this value as follows:

Formula interface for glmnet

阅读更多关于 Formula interface for glmnet

问题 In the last few months I've worked on a number of projects where I've used the glmnet package to fit elastic net models. It's great, but the interface is rather bare-bones compared to most R modelling functions. In particular, rather than specifying a formula and data frame, you have to give a response vector and predictor matrix. You also lose out on many quality-of-life things that the regular interface provides, eg sensible (?) treatment of factors, missing values, putting variables into

Difference between glmnet() and cv.glmnet() in R?

阅读更多关于 Difference between glmnet() and cv.glmnet() in R?

I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda= summ$x) c2 <- c[with(c, order(-Lambda)), ] The beginning imports a large amount of data from my

How does glmnet compute the maximal lambda value?

阅读更多关于 How does glmnet compute the maximal lambda value?

The glmnet package uses a range of LASSO tuning parameters lambda scaled from the maximal lambda_max under which no predictors are selected. I want to find out how glmnet computes this lambda_max value. For example, in a trivial dataset: set.seed(1) library("glmnet") x <- matrix(rnorm(100*20),100,20) y <- rnorm(100) fitGLM <- glmnet(x,y) max(fitGLM$lambda) # 0.1975946 The package vignette ( http://www.jstatsoft.org/v33/i01/paper ) describes in section 2.5 that it computes this value as follows: sx <- as.matrix(scale(x)) sy <- as.vector(scale(y)) max(abs(colSums(sx*sy)))/100 # 0.1865232 Which