linear-regression

How to obtain RMSE out of lm result?

别说谁变了你拦得住时间么 提交于 2019-12-02 21:06:17
I know there is a small difference between $sigma and the concept of root mean squared error . So, i am wondering what is the easiest way to obtain RMSE out of lm function in R ? res<-lm(randomData$price ~randomData$carat+ randomData$cut+randomData$color+ randomData$clarity+randomData$depth+ randomData$table+randomData$x+ randomData$y+randomData$z) length(coefficients(res)) contains 24 coefficient, and I cannot make my model manually anymore. So, how can I evaluate the RMSE based on coefficients derived from lm ? Residual sum of squares: RSS <- c(crossprod(res$residuals)) Mean squared error:

Python pandas linear regression groupby

牧云@^-^@ 提交于 2019-12-02 18:51:16
I am trying to use a linear regression on a group by pandas python dataframe: This is the dataframe df: group date value A 01-02-2016 16 A 01-03-2016 15 A 01-04-2016 14 A 01-05-2016 17 A 01-06-2016 19 A 01-07-2016 20 B 01-02-2016 16 B 01-03-2016 13 B 01-04-2016 13 C 01-02-2016 16 C 01-03-2016 16 #import standard packages import pandas as pd import numpy as np #import ML packages from sklearn.linear_model import LinearRegression #First, let's group the data by group df_group = df.groupby('group') #Then, we need to change the date to integer df['date'] = pd.to_datetime(df['date']) df['date_delta

predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

偶尔善良 提交于 2019-12-02 17:29:57
This R code throws a warning # Fit regression model to each cluster y <- list() length(y) <- k vars <- list() length(vars) <- k f <- list() length(f) <- k for (i in 1:k) { vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"]) f[[i]] <- as.formula(paste("Death ~", paste(vars[[i]], collapse= "+"))) y[[i]] <- lm(f[[i]], data=C1[[i]]) #training set C1[[i]] <- cbind(C1[[i]], fitted(y[[i]])) C2[[i]] <- cbind(C2[[i]], predict(y[[i]], C2[[i]])) #test set } I have a training data set (C1) and a test data set (C2). Each one has 129 variables. I did k means cluster analysis on the C1 and then split my data set

Why do I get only one parameter from a statsmodels OLS fit

▼魔方 西西 提交于 2019-12-02 15:50:58
Here is what I am doing: $ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>> statsmodels.__version__ '0.5.0' >>> import numpy >>> y = numpy.array([1,2,3,4,5,6,7,8,9]) >>> X = numpy.array([1,1,2,2,3,3,4,4,5]) >>> res_ols = sm.OLS(y, X).fit() >>> res_ols.params array([ 1.82352941]) I had expected an array with two elements?!? The intercept and the slope coefficient? behzad.nouri Try this: X = sm.add_constant(X) sm.OLS(y,X) as in the documentation : An intercept is not included by default and

why gradient descent when we can solve linear regression analytically

╄→гoц情女王★ 提交于 2019-12-02 14:02:47
what is the benefit of using Gradient Descent in the linear regression space? looks like the we can solve the problem (finding theta0-n that minimum the cost func) with analytical method so why we still want to use gradient descent to do the same thing? thanks When you use the normal equations for solving the cost function analytically you have to compute: Where X is your matrix of input observations and y your output vector. The problem with this operation is the time complexity of calculating the inverse of a nxn matrix which is O(n^3) and as n increases it can take a very long time to

Incorrect abline line for a regression model with intercept in R

China☆狼群 提交于 2019-12-02 12:11:19
问题 (reproducible example given) In the following, I get an abline line with y-intercept is about 30, but the regression says y-intercept should be 37.2851 Where am I wrong? mtcars$mpg # 21.0 21.0 22.8 ... 21.4 (32 obs) mtcars$wt # 2.620 2.875 2.320 ... 2.780 (32 obs) regression1 <- lm(mtcars$mpg ~ mtcars$wt) coef(regression1) # mpg ~ 37.2851 - 5.3445wt plot(mtcars$mpg ~ mtcars$wt, pch=19, col='gray50') # pch: shape of points abline(h=mean(mtcars$mpg), lwd=2, col ='darkorange') # The y-coordinate

Get p-value for group mean difference without refitting linear model with a new reference level

隐身守侯 提交于 2019-12-02 11:14:23
问题 When we have a linear model with a factor variable X (with levels A , B , and C ) y ~ factor(X) + Var2 + Var3 The result shows the estimate XB and XC which is differences B - A and C - A . (suppose that the reference is A ). If we want to know the p-value of the difference between B and C : C - B , we should designate B or C as a reference group and re-run the model. Can we get the p-values of the effect B - A , C - A , and C - B at one time? 回答1: You are looking for linear hypothesis test by

How can I force dropping intercept or equivalent in this linear model?

允我心安 提交于 2019-12-02 10:47:25
Consider the following table : DB <- data.frame( Y =rnorm(6), X1=c(T, T, F, T, F, F), X2=c(T, F, T, F, T, T) ) Y X1 X2 1 1.8376852 TRUE TRUE 2 -2.1173739 TRUE FALSE 3 1.3054450 FALSE TRUE 4 -0.3476706 TRUE FALSE 5 1.3219099 FALSE TRUE 6 0.6781750 FALSE TRUE I'd like to explain my quantitative variable Y by two binary variables (TRUE or FALSE) without intercept. The argument of this choice is that, in my study, we can't observe X1=FALSE and X2=FALSE at the same time, so it doesn't make sense to have a mean, other than 0, for this level. With intercept m1 <- lm(Y~X1+X2, data=DB) summary(m1)

Linear combination of regression coefficients in R [closed]

ⅰ亾dé卋堺 提交于 2019-12-02 09:33:49
I need to run a multiple regression in R, with the variables X1, X2 and X3, where there is a variable θ = β2 + β3. So instead of β2, for the coefficient of X2 I need to use (θ - β3). How could I do this? Note that Y = b1 * x1 + (t - b3) * x2 + b3 * x3 is equivalent to Y = b1 * x1 + t * x2 - b3 * x2 + b3 * x3 = b1 * x1 + t * x2 + b3 * (x3 - x2) So, you can continue from there easily. 来源: https://stackoverflow.com/questions/53561699/linear-combination-of-regression-coefficients-in-r

How to plot a comparisson of two fixed categorical values for linear regression of another continuous variable

六月ゝ 毕业季﹏ 提交于 2019-12-02 08:57:37
问题 So I want to plot this: lmfit = lm (y ~ a + b) but, "b" only has the values of zero and one. So, I want to plot two separate regression lines, that are paralel to one one another to show the difference that b makes to the y-intercept. So after plotting this: plot(b,y) I want to then use abline(lmfit,col="red",lwd=2) twice, once with the x value of b set to zero, and once with it set to one. So once without the term included, and once where b is just 1b. To restate: b is categorical, 0 or 1. a