regression

How can I force cv.glmnet not to drop one specific variable?

蓝咒 提交于 2019-12-20 09:53:57
问题 I am running a regression with 67 observasions and 32 variables. I am doing variable selection using cv.glmnet function from the glmnet package. There is one variable I want to force into the model. (It is dropped during normal procedure.) How can I specify this condition in cv.glmnet? Thank you! My code looks like the following: glmntfit <- cv.glmnet(mydata[,-1], mydata[,1]) coef(glmntfit, s=glmntfit$lambda.1se) And the variable I want is mydata[,2]. 回答1: This can be achieved by providing a

Newey-West standard errors with Mean Groups/Fama-MacBeth estimator

半城伤御伤魂 提交于 2019-12-20 09:52:14
问题 I'm trying to get Newey-West standard errors to work with the output of pmg() (Mean Groups/Fama-MacBeth estimator) from the plm package. Following the example from here: require(foreign) require(plm) require(lmtest) test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta") fpmg <- pmg(y~x, test, index=c("firmid", "year")) # Time index in second position, unlike the example I can use coeftest directly just fine to get the Fama-MacBeth standard errors:

When using Gnuplot, how can the equation of a line be printed in the line title?

最后都变了- 提交于 2019-12-20 09:44:38
问题 I have used Gnuplot to plot my data, along with a linear regression line. Currently, the 'title' of this line, which has its equation calculated by Gnuplot, is just "f(x)". However, I would like the title to be the equation of the regression line, e.g. "y=mx+c". I can do this manually by reading off 'm' and 'c' from the plotting info output, then re-plot with the new title. I would like this process to be automated, and was wondering if this can be done, and how to go about doing it. 回答1:

ggplot2: How to curve small gaussian densities on a regression line?

为君一笑 提交于 2019-12-20 09:37:35
问题 I want to graphically show the assumptions of linear (and later other type) regression. How can I add to my plot small Gaussian densities (or any type of densities) on a regression line just like in this figure: 回答1: You can compute the empirical densities of the residuals for sections along a fitted line. Then, it is just a matter of drawing the lines at the positions of your choosing in each interval using geom_path . To add theoretical distribution, generate some densities along the range

Trend lines ( regression, curve fitting) java library

情到浓时终转凉″ 提交于 2019-12-20 08:52:55
问题 I'm trying to develop an application that would compute the same trend lines that excel does, but for larger datasets. But I'm not able to find any java library that calculates such regressions. For the linera model I'm using Apache Commons math, and for the other there was a great numerical library from Michael Thomas Flanagan but since january it is no longer available: http://www.ee.ucl.ac.uk/~mflanaga/java/ Do you know any other libraries, code repositories to calculate these regressions

Normalize data before or after split of training and testing data?

百般思念 提交于 2019-12-20 08:42:51
问题 I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model? Thanks in advance. 回答1: You first need to split the data into training and test set (validation set might also be required). Don't forget that testing data points represent real-world data. Feature normalization (or data standardization) of the explanatory (or predictor) variables is a technique used to center and

Normalize data before or after split of training and testing data?

泄露秘密 提交于 2019-12-20 08:42:08
问题 I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model? Thanks in advance. 回答1: You first need to split the data into training and test set (validation set might also be required). Don't forget that testing data points represent real-world data. Feature normalization (or data standardization) of the explanatory (or predictor) variables is a technique used to center and

Automate script to run linear regression R

大憨熊 提交于 2019-12-20 07:35:12
问题 I am looking to run Linear Regression on the below data frame. test<-data.frame(abc=c(2.4,3.2,8.9,9.8,10.0,3.2,5.4), city1_0=c(5.3,2.6,3,5.4,7.8,4.4,5.5), city1_1=c(2.3,5.6,3,2.4,3.6,2.4,6.5), city1_2=c(4.2,1.4,2.6,2,6,3.6,2.4), city1_3=c(2.4,2.6,9.4,4.6,2.5,1.2,7.5), city1_4=c(8.2,4.2,7.6,3.4,1.7,5.2,9.7), city2_0=c(4.3,8.6,6,3.7,7.8,4.7,5.8), city2_1=c(5.3,2.6,3,5.4,7.8,4.4,5.5)) Dataframe "test" is sample of the data. But original data frame contains 100 columns. I want to create a script

Interpreting regression coefficients in R [closed]

时间秒杀一切 提交于 2019-12-20 06:39:26
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I'm trying to fit a x*log(x) model to the data. The fitting is performed successfully but I have difficulties in interpreting the resulting coefficients. Here a snapshot of my code. x <- c(6, 11, 16, 21, 26, 31, 36, 41, 46, 51) y <- c(5.485, 6.992, 7.447, 8.134, 8.524, 8.985, 9.271, 9.647, 10.561, 9.971) fit <-

How can I force dropping intercept or equivalent in this linear model?

若如初见. 提交于 2019-12-20 06:21:58
问题 Consider the following table : DB <- data.frame( Y =rnorm(6), X1=c(T, T, F, T, F, F), X2=c(T, F, T, F, T, T) ) Y X1 X2 1 1.8376852 TRUE TRUE 2 -2.1173739 TRUE FALSE 3 1.3054450 FALSE TRUE 4 -0.3476706 TRUE FALSE 5 1.3219099 FALSE TRUE 6 0.6781750 FALSE TRUE I'd like to explain my quantitative variable Y by two binary variables (TRUE or FALSE) without intercept. The argument of this choice is that, in my study, we can't observe X1=FALSE and X2=FALSE at the same time, so it doesn't make sense