glmnet

cv.glmnet fails for ridge, not lasso, for simulated data with coder error

故事扮演 提交于 2019-12-05 02:47:17
Gist The error: Error in predmat[which, seq(nlami)] = preds : replacement has length zero The context: data is simulated with a binary y, but there are n coders of true y . the data is stacked n times and a model is fitted, trying to get true y . The error is received for L2 penalty, but not L1 penalty. when Y is the coder Y, but not when it is the true Y. the error is not deterministic, but depends on seed. UPDATE: the error is for versions after 1.9-8. 1.9-8 does not fail. Reproduction base data: library(glmnet) rm(list=ls()) set.seed(123) num_obs=4000 n_coders=2 precision=.8 X <- matrix

Ridge-regression model: glmnet

对着背影说爱祢 提交于 2019-12-04 20:37:46
Fitting a linear-regression model using least squares on my training dataset works fine. library(Matrix) library(tm) library(glmnet) library(e1071) library(SparseM) library(ggplot2) trainingData <- read.csv("train.csv", stringsAsFactors=FALSE,sep=",", header = FALSE) testingData <- read.csv("test.csv",sep=",", stringsAsFactors=FALSE, header = FALSE) lm.fit = lm(as.factor(V42)~ ., data = trainingData) linearMPrediction = predict(lm.fit,newdata = testingData, se.fit = TRUE) mean((linearMPrediction$fit - testingData[,20:41])^2) linearMPrediction$residual.scale However, when i try to fit a ridge

Glmnet is different with intercept=TRUE compared to intercept=FALSE and with penalty.factor=0 for an intercept in x

我的梦境 提交于 2019-12-04 17:02:27
I am new to glmnet and playing with the penalty.factor option. The vignette says that it "Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model." And the longer PDF document has code. So I expected that running a regression with intercept = TRUE and no constant in x would be the same as with intercept = FALSE and a constant in x with penalty.factor = 0 . But the code below shows that it is not: the latter case has an intercept of 0 and the other two coefficients are 20% larger than in the former. library("glmnet") set.seed(7) # penalty for

R error in glmnet: NA/NaN/Inf in foreign function call

穿精又带淫゛_ 提交于 2019-12-04 03:09:05
问题 I am trying to create a model using glmnet, (currently using cv to find the lambda value) and I am getting an error NA/NaN/Inf in foreign function call (arg 5) . I believe this has something to do with the NA values in my data set, because when I remove all data points with NAs the command runs successfully. I was under the impression that glmnet can handle NA values. I'm not sure where the error is coming from: > res <- cv.glmnet(features.mat, as.factor(tmp[,"outcome"]), family="binomial")

Is cv.glmnet overfitting the the data by using the full lambda sequence?

拈花ヽ惹草 提交于 2019-12-03 20:15:08
cv.glmnet has been used by most research papers and companies. While building a similar function like cv.glmnet for glmnet.cr (a similar package that implements the lasso for continuation ratio ordinal regression) I came across this problem in cv.glmnet . `cv.glmnet` first fits the model: glmnet.object = glmnet(x, y, weights = weights, offset = offset, lambda = lambda, ...) After the glmnet object is created with the complete data, the next step goes as follows: The lambda from the complete model fitted is extracted lambda = glmnet.object$lambda Now they make sure number of folds is more than

Difference between glmnet() and cv.glmnet() in R?

和自甴很熟 提交于 2019-12-03 16:56:26
问题 I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda=

How does glmnet compute the maximal lambda value?

帅比萌擦擦* 提交于 2019-12-03 12:49:24
问题 The glmnet package uses a range of LASSO tuning parameters lambda scaled from the maximal lambda_max under which no predictors are selected. I want to find out how glmnet computes this lambda_max value. For example, in a trivial dataset: set.seed(1) library("glmnet") x <- matrix(rnorm(100*20),100,20) y <- rnorm(100) fitGLM <- glmnet(x,y) max(fitGLM$lambda) # 0.1975946 The package vignette (http://www.jstatsoft.org/v33/i01/paper) describes in section 2.5 that it computes this value as follows:

Formula interface for glmnet

*爱你&永不变心* 提交于 2019-12-03 11:38:15
问题 In the last few months I've worked on a number of projects where I've used the glmnet package to fit elastic net models. It's great, but the interface is rather bare-bones compared to most R modelling functions. In particular, rather than specifying a formula and data frame, you have to give a response vector and predictor matrix. You also lose out on many quality-of-life things that the regular interface provides, eg sensible (?) treatment of factors, missing values, putting variables into

Difference between glmnet() and cv.glmnet() in R?

浪子不回头ぞ 提交于 2019-12-03 06:05:27
I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda= summ$x) c2 <- c[with(c, order(-Lambda)), ] The beginning imports a large amount of data from my

How does glmnet compute the maximal lambda value?

末鹿安然 提交于 2019-12-03 03:54:18
The glmnet package uses a range of LASSO tuning parameters lambda scaled from the maximal lambda_max under which no predictors are selected. I want to find out how glmnet computes this lambda_max value. For example, in a trivial dataset: set.seed(1) library("glmnet") x <- matrix(rnorm(100*20),100,20) y <- rnorm(100) fitGLM <- glmnet(x,y) max(fitGLM$lambda) # 0.1975946 The package vignette ( http://www.jstatsoft.org/v33/i01/paper ) describes in section 2.5 that it computes this value as follows: sx <- as.matrix(scale(x)) sy <- as.vector(scale(y)) max(abs(colSums(sx*sy)))/100 # 0.1865232 Which