regression

R print equation of linear regression on the plot itself

与世无争的帅哥 提交于 2019-11-27 16:28:02
问题 How do we print the equation of a line on a plot? I have 2 independent variables and would like an equation like this: y=mx1+bx2+c where x1=cost, x2 =targeting I can plot the best fit line but how do i print the equation on the plot? Maybe i cant print the 2 independent variables in one equation but how do i do it for say y=mx1+c at least? Here is my code: fit=lm(Signups ~ cost + targeting) plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups") abline(lm(Signups ~ cost)) 回答1: I

Aligning Data frame with missing values

荒凉一梦 提交于 2019-11-27 15:49:10
I'm using a data frame with many NA values. While I'm able to create a linear model, I am subsequently unable to line the fitted values of the model up with the original data due to the missing values and lack of indicator column. Here's a reproducible example: library(MASS) dat <- Aids2 # Add NA's dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA # Create a model model <- lm(death ~ diag + age, data = dat) # Different Values length(fitted.values(model)) # 2745 nrow(dat) # 2843 There are actually three solutions here: pad NA to fitted values ourselves; use predict() to compute fitted

How do I determine the coefficients for a linear regression line in MATLAB? [closed]

大兔子大兔子 提交于 2019-11-27 15:25:12
I'm going to write a program where the input is a data set of 2D points and the output is the regression coefficients of the line of best fit by minimizing the minimum MSE error. I have some sample points that I would like to process: X Y 1.00 1.00 2.00 2.00 3.00 1.30 4.00 3.75 5.00 2.25 How would I do this in MATLAB? Specifically, I need to get the following formula: y = A + Bx + e A is the intercept and B is the slope while e is the residual error per point. Judging from the link you provided, and my understanding of your problem, you want to calculate the line of best fit for a set of data

How to calculate variance of least squares estimator using QR decomposition in R?

坚强是说给别人听的谎言 提交于 2019-11-27 15:20:28
问题 I'm trying to learn QR decomposition, but can't figure out how to get the variance of beta_hat without resorting to traditional matrix calculations. I'm practising with the iris data set, and here's what I have so far: y<-(iris$Sepal.Length) x<-(iris$Sepal.Width) X<-cbind(1,x) n<-nrow(X) p<-ncol(X) qr.X<-qr(X) b<-(t(qr.Q(qr.X)) %*% y)[1:p] R<-qr.R(qr.X) beta<-as.vector(backsolve(R,b)) res<-as.vector(y-X %*% beta) Thanks for your help! 回答1: setup (copying in your code) y <- iris$Sepal.Length x

How to interpret lm() coefficient estimates when using bs() function for splines

半世苍凉 提交于 2019-11-27 15:13:14
问题 I'm using a set of points which go from (-5,5) to (0,0) and (5,5) in a "symmetric V-shape". I'm fitting a model with lm() and the bs() function to fit a "V-shape" spline: lm(formula = y ~ bs(x, degree = 1, knots = c(0))) I get the "V-shape" when I predict outcomes by predict() and draw the prediction line. But when I look at the model estimates coef() , I see estimates that I don't expect. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.93821 0.16117 30.639 1.40e-09 *** bs(x,

use stepAIC on a list of models

最后都变了- 提交于 2019-11-27 14:33:10
I want to do stepwise regression using AIC on a list of linear models. idea is to use e a list of linear models and then apply stepAIC on each list element. It fails. Hi guys I tried to track the problem down. I think I found the problem. However, I dont understand the cause. Try the code to see the difference between three cases. require(MASS) n<-30 x1<-rnorm(n, mean=0, sd=1) #create rv x1 x2<-rnorm(n, mean=1, sd=1) x3<-rnorm(n, mean=2, sd=1) epsilon<-rnorm(n,mean=0,sd=1) # random error variable dat<-as.data.frame(cbind(x1,x2,x3,epsilon)) # combine to a data frame dat$id<-c(rep(1,10),rep(2,10

Logistic regression with robust clustered standard errors in R

偶尔善良 提交于 2019-11-27 13:16:45
问题 A newbie question: does anyone know how to run a logistic regression with clustered standard errors in R? In Stata it's just logit Y X1 X2 X3, vce(cluster Z) , but unfortunately I haven't figured out how to do the same analysis in R. Thanks in advance! 回答1: You might want to look at the rms (regression modelling strategies) package. So, lrm is logistic regression model, and if fit is the name of your output, you'd have something like this: fit=lrm(disease ~ age + study + rcs(bmi,3), x=T, y=T,

Difference between cross_val_score and cross_val_predict

落爺英雄遲暮 提交于 2019-11-27 13:09:37
问题 I want to evaluate a regression model build with scikitlearn using cross-validation and getting confused, which of the two functions cross_val_score and cross_val_predict I should use. One option would be : cvs = DecisionTreeRegressor(max_depth = depth) scores = cross_val_score(cvs, predictors, target, cv=cvfolds, scoring='r2') print("R2-Score: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)) An other one, to use the cv-predictions with the standard r2_score : cvp =

Multivariate polynomial regression with numpy

二次信任 提交于 2019-11-27 12:26:23
I have many samples (y_i, (a_i, b_i, c_i)) where y is presumed to vary as a polynomial in a,b,c up to a certain degree. For example for a given set of data and degree 2 I might produce the model y = a^2 + 2ab - 3cb + c^2 +.5ac This can be done using least squares and is a slight extension of numpy's polyfit routine. Is there a standard implementation somewhere in the Python ecosystem? sklearn provides a simple way to do this. Building off an example posted here : #X is the independent variable (bivariate in this case) X = array([[0.44, 0.68], [0.99, 0.23]]) #vector is the dependent data vector

Fixed effect in Pandas or Statsmodels

纵饮孤独 提交于 2019-11-27 12:12:09
Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels. There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is something called plm , but I can't import it or run it using pd.plm() . As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options: If you use Python 3 you can use linearmodels as specified in the more recent answer: https://stackoverflow.com/a/44836199/3435183 Just specify various dummies in your statsmodels specification, e.g. using pd