linear-regression | 易学教程

What is the red solid line in the “residuals vs leverage” plot produced by `plot.lm()`?

阅读更多关于 What is the red solid line in the “residuals vs leverage” plot produced by `plot.lm()`?

问题 fit <- lm(dist ~ speed, cars) plot(fit, which = 5) What does the solid red line in the middle of plot mean? I think it is not about cook's distance. 回答1: It is the LOESS regression line (with span = 2/3 and degree = 2 ), by smoothing standardised residuals against leverage. Internally in plot.lm() , variable xx is leverage, while rsp is Pearson residuals (i.e., standardised residuals). Then, the scattered plot as well as the red solid line is drawn via: graphics::panel.smooth(xx, rsp) Here is

Use a function with a linear regression model

阅读更多关于 Use a function with a linear regression model

问题 I can run multiple linear regressions, and in each model estimate coefficients by removing one observation from the data.frame like this: library(plyr) as.data.frame(laply(1:nrow(mtcars), function(x) coef(lm(mpg ~ hp + wt, mtcars[-x,])))) (Intercept) hp wt 1 37.48509 -0.03207047 -3.918260 2 37.33931 -0.03219086 -3.877571 3 37.56512 -0.03216482 -3.939386 4 37.22292 -0.03171010 -3.880721 5 37.22437 -0.03185754 -3.876831 6 37.23686 -0.03340464 -3.781698 7 37.21965 -0.03030994 -3.927877 8 37

Python Parallel Computing - Scoop

阅读更多关于 Python Parallel Computing - Scoop

问题 I am trying to get familiar with the library Scoop (documentation here: https://media.readthedocs.org/pdf/scoop/0.7/scoop.pdf) to learn how to perform statistical computations in parallel, using in particular the futures.map function. As such, at first, I would like to try to run a simple linear regression, and assess the difference in performance between serial and parallel computations, using 10000000 data point (4 features, 1 target variable) randomly generated from a Normal Distribution.

How to compute minimal but fast linear regressions on each column of a response matrix?

阅读更多关于 How to compute minimal but fast linear regressions on each column of a response matrix?

问题 I want to compute ordinary least square ( OLS ) estimates in R without using "lm" , and this for several reasons. First, "lm" also computes lots of stuff I don't need (such as the fitted values) considering that data size is an issue in my case. Second, I want to be able to implement OLS myself in R before doing it in another language (eg. in C with the GSL). As you may know, the model is: Y=Xb+E; with E ~ N(0, sigma^2). As detailed below, b is a vector with 2 parameters, the mean (b0) and

Multiple Linear Regression with specific constraint on each coefficients on Python

阅读更多关于 Multiple Linear Regression with specific constraint on each coefficients on Python

问题 I am currently running multiple linear regression on a dataset. At first, I didn't realize I needed to put constraints over my weights; as a matter of fact, I need to have specific positive & negative weights. To be more precise, I am doing a scoring system and this is why some of my variables should have a positive or negative impact on the note. Yet, when running my model, the results do not fit what I am expecting, some of my 'positive' variables get negative coefficients and vice versa.

sklearn's PLSRegression: “ValueError: array must not contain infs or NaNs”

阅读更多关于 sklearn's PLSRegression: “ValueError: array must not contain infs or NaNs”

问题 When using sklearn.cross_decomposition.PLSRegression: import numpy as np import sklearn.cross_decomposition pls2 = sklearn.cross_decomposition.PLSRegression() xx = np.random.random((5,5)) yy = np.zeros((5,5) ) yy[0,:] = [0,1,0,0,0] yy[1,:] = [0,0,0,1,0] yy[2,:] = [0,0,0,0,1] #yy[3,:] = [1,0,0,0,0] # Uncommenting this line solves the issue pls2.fit(xx, yy) I get: C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:44: RuntimeWarning: invalid value encountered in divide x_weights

Multiple Linear Regression in Power BI

阅读更多关于 Multiple Linear Regression in Power BI

问题 Suppose I have a set of returns and I want to compute its beta values versus different market indices. Let's use the following set of data in a table named Returns for the sake of having a concrete example: Date Equity Duration Credit Manager ----------------------------------------------- 01/31/2017 2.907% 0.226% 1.240% 1.78% 02/28/2017 2.513% 0.493% 1.120% 3.88% 03/31/2017 1.346% -0.046% -0.250% 0.13% 04/30/2017 1.612% 0.695% 0.620% 1.04% 05/31/2017 2.209% 0.653% 0.480% 1.40% 06/30/2017 0

Shaping data for linear regression with TFlearn

阅读更多关于 Shaping data for linear regression with TFlearn

I'm trying to expand the tflearn example for linear regression by increasing the number of columns to 21. from trafficdata import X,Y import tflearn print(X.shape) #(1054, 21) print(Y.shape) #(1054,) # Linear Regression graph input_ = tflearn.input_data(shape=[None,21]) linear = tflearn.single_unit(input_) regression = tflearn.regression(linear, optimizer='sgd', loss='mean_square', metric='R2', learning_rate=0.01) m = tflearn.DNN(regression) m.fit(X, Y, n_epoch=1000, show_metric=True, snapshot_epoch=False) print("\nRegression result:") print("Y = " + str(m.get_weights(linear.W)) + "*X + " +

OLS with pandas: datetime index as predictor

阅读更多关于 OLS with pandas: datetime index as predictor

问题 I would like to use pandas OLS function to fit a trendline to my data Series. Does anyone knows how to use the datetime index from the pandas Series as predictor in the OLS? For example, let say that I have a simple time series: >>> ts 2001-12-31 19.828763 2002-12-31 20.112191 2003-12-31 19.509116 2004-12-31 19.913656 2005-12-31 19.701649 2006-12-31 20.022819 2007-12-31 20.103024 2008-12-31 20.132712 2009-12-31 19.850609 2010-12-31 19.290640 2011-12-31 19.936210 2012-12-31 19.664813 Freq: A

How to get the P Value in a Variable from OLSResults in Python?

阅读更多关于 How to get the P Value in a Variable from OLSResults in Python?

问题 The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary()) shows the P values of each attribute to only 3 decimal places. I need to extract the p value for each attribute like Distance , CarrierNum etc. and print it in scientific notation. I can extract the coefficients using fit.params[0] or fit.params[1] etc. Need to get it for all their P values.