glmnet | 易学教程

How to solve 'protection stack overflow' issue in R Studio

阅读更多关于 How to solve 'protection stack overflow' issue in R Studio

I'm trying to build a model with the glmnet package, but I'm getting the following error when I run the following line: #library('glmnet') x = model.matrix(response ~ ., data = acgh_frame[,c(3:ncol(acgh_frame))]) Error: protect(): protection stack overflow I know this is due to my large number of variables (26k+) in the dataframe. When I use fewer variables the error doesn't show. I know how to solve this in command line R, but I require to stay in R studio, so I want to fix it from R Studio. So, how do I do this? @Ansjovis86 You can specify the ppsize as a command line argument to Rstudio

executing glmnet in parallel in R

阅读更多关于 executing glmnet in parallel in R

问题 My training dataset has about 200,000 records and I have 500 features. (These are sales data from a retail org). Most of the features are 0/1 and is stored as a sparse matrix. The goal is to predict the probability to buy for about 200 products. So, I would need to use the same 500 features to predict the probability of purchase for 200 products. Since glmnet is a natural choice for model creation, I thought about implementing glmnet in parallel for the 200 products. (Since all the 200 models

R glmnet : “(list) object cannot be coerced to type 'double' ”

阅读更多关于 R glmnet : “(list) object cannot be coerced to type 'double' ”

I'm trying to use the glmnet package on a dataset. I'm using cv.glmnet() to get a lambda value for glmnet() . Here's the dataset and error message: > head(t2) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 1 1 1 0.7661266 45 2 0.80298213 9120 13 0 6 0 2 2 2 0 0.9571510 40 0 0.12187620 2600 4 0 0 0 1 3 3 0 0.6581801 38 1 0.08511338 3042 2 1 0 0 0 4 4 0 0.2338098 30 0 0.03604968 3300 5 0 0 0 0 5 5 0 0.9072394 49 1 0.02492570 63588 7 0 1 0 0 6 6 0 0.2131787 74 0 0.37560697 3500 3 0 1 0 1 > str(t2) 'data.frame': 150000 obs. of 12 variables: $ X1 : int 1 2 3 4 5 6 7 8 9 10 ... $ X2 : int 1 0 0 0 0 0 0 0 0

Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

阅读更多关于 Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

问题 I have a regression model with binary outcome. I fitted the model with glmnet and got the selected variables and their coefficients. Since glmnet doesn't calculate variable importance, I would like to feed the exact output (selected variables and their coefficients) to glm to get the information (Standard errors, etc). I searched r documents, it seems I can use "method" option in glm to specify user defined function. But I failed to do so, could someone help me with this? 回答1: "It is a very

Extracting coefficient variable names from glmnet into a data.frame

阅读更多关于 Extracting coefficient variable names from glmnet into a data.frame

I would like to extract the glmnet generated model coefficients and create a SQL query from them. The function coef(cv.glmnet.fit) yields a ' dgCMatrix ' object. When I convert it to a matrix using as.matrix , the variable names are lost and only the coefficient values are left behind. I know one can print the coefficients in the screen, however is it possible to write the names to a data frame? Can anybody assist to extract these names? UPDATE: Both first two comments of my answer are right. I have kept the answer below the line just for posterity. The following answer is short, it works and

Adding labels on curves in glmnet plot in R

阅读更多关于 Adding labels on curves in glmnet plot in R

I am using glmnet package to get following graph from mtcars dataset (regression of mpg on other variables): library(glmnet) fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1]) plot(fit, xvar='lambda') How can I add names of variables to each curve, either at beginning of each curve or at its maximal y point (maximum away from x-axis)? I tried and I can add legend as usual but not labels on each curve or at its start. Thanks for your help. As the labels are hard coded it is perhaps easier to write a quick function. This is just a quick shot, so can be changed to be more thorough. I would also note

Extracting coefficient variable names from glmnet into a data.frame

阅读更多关于 Extracting coefficient variable names from glmnet into a data.frame

问题 I would like to extract the glmnet generated model coefficients and create a SQL query from them. The function coef(cv.glmnet.fit) yields a ' dgCMatrix ' object. When I convert it to a matrix using as.matrix , the variable names are lost and only the coefficient values are left behind. I know one can print the coefficients in the screen, however is it possible to write the names to a data frame? Can anybody assist to extract these names? 回答1: UPDATE: Both first two comments of my answer are

Adding labels on curves in glmnet plot in R

阅读更多关于 Adding labels on curves in glmnet plot in R

问题 I am using glmnet package to get following graph from mtcars dataset (regression of mpg on other variables): library(glmnet) fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1]) plot(fit, xvar='lambda') How can I add names of variables to each curve, either at beginning of each curve or at its maximal y point (maximum away from x-axis)? I tried and I can add legend as usual but not labels on each curve or at its start. Thanks for your help. 回答1: As the labels are hard coded it is perhaps easier to

Ridge regression with `glmnet` gives different coefficients than what I compute by “textbook definition”?

阅读更多关于 Ridge regression with `glmnet` gives different coefficients than what I compute by “textbook definition”?

问题 I am running Ridge regression with the use of glmnet R package. I noticed that the coefficients I obtain from glmnet::glmnet function are different from those I get by computing coefficients by definition (with the use of the same lambda value). Could somebody explain me why? Data (both: response Y and design matrix X ) are scaled. library(MASS) library(glmnet) # Data dimensions p.tmp <- 100 n.tmp <- 100 # Data objects set.seed(1) X <- scale(mvrnorm(n.tmp, mu = rep(0, p.tmp), Sigma = diag(p