linear-regression | 易学教程

Multiple Linear Regression with specific constraint on each coefficients on Python

阅读更多关于 Multiple Linear Regression with specific constraint on each coefficients on Python

I am currently running multiple linear regression on a dataset. At first, I didn't realize I needed to put constraints over my weights; as a matter of fact, I need to have specific positive & negative weights. To be more precise, I am doing a scoring system and this is why some of my variables should have a positive or negative impact on the note. Yet, when running my model, the results do not fit what I am expecting, some of my 'positive' variables get negative coefficients and vice versa. As an example, let's suppose my model is : y = W0*x0 + W1*x1 + W2*x2 Where x2 is a 'positive' variable,

How to compute minimal but fast linear regressions on each column of a response matrix?

阅读更多关于 How to compute minimal but fast linear regressions on each column of a response matrix?

I want to compute ordinary least square ( OLS ) estimates in R without using "lm" , and this for several reasons. First, "lm" also computes lots of stuff I don't need (such as the fitted values) considering that data size is an issue in my case. Second, I want to be able to implement OLS myself in R before doing it in another language (eg. in C with the GSL). As you may know, the model is: Y=Xb+E; with E ~ N(0, sigma^2). As detailed below, b is a vector with 2 parameters, the mean (b0) and another coefficients (b1). At the end, for each linear regression I will do, I want the estimate for b1

Multiple Linear Regression in Power BI

阅读更多关于 Multiple Linear Regression in Power BI

Suppose I have a set of returns and I want to compute its beta values versus different market indices. Let's use the following set of data in a table named Returns for the sake of having a concrete example: Date Equity Duration Credit Manager ----------------------------------------------- 01/31/2017 2.907% 0.226% 1.240% 1.78% 02/28/2017 2.513% 0.493% 1.120% 3.88% 03/31/2017 1.346% -0.046% -0.250% 0.13% 04/30/2017 1.612% 0.695% 0.620% 1.04% 05/31/2017 2.209% 0.653% 0.480% 1.40% 06/30/2017 0.796% -0.162% 0.350% 0.63% 07/31/2017 2.733% 0.167% 0.830% 2.06% 08/31/2017 0.401% 1.083% -0.670% 0.29%

sklearn's PLSRegression: “ValueError: array must not contain infs or NaNs”

阅读更多关于 sklearn's PLSRegression: “ValueError: array must not contain infs or NaNs”

When using sklearn.cross_decomposition.PLSRegression : import numpy as np import sklearn.cross_decomposition pls2 = sklearn.cross_decomposition.PLSRegression() xx = np.random.random((5,5)) yy = np.zeros((5,5) ) yy[0,:] = [0,1,0,0,0] yy[1,:] = [0,0,0,1,0] yy[2,:] = [0,0,0,0,1] #yy[3,:] = [1,0,0,0,0] # Uncommenting this line solves the issue pls2.fit(xx, yy) I get: C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:44: RuntimeWarning: invalid value encountered in divide x_weights = np.dot(X.T, y_score) / np.dot(y_score.T, y_score) C:\Anaconda\lib\site-packages\sklearn\cross

Format of R's lm() Formula with a Transformation

阅读更多关于 Format of R's lm() Formula with a Transformation

I can't quite figure out how to do the following in one line: data(attenu) x_temp = attenu$accel^(1/4) y_temp = log(attenu$dist) best_line = lm(y_temp ~ x_temp) Since the above works, I thought I could do the following: data(attenu) best_line = lm( log(attenu$dist) ~ (attenu$accel^(1/4)) ) But this gives the error: Error in terms.formula(formula, data = data) : invalid power in formula There's obviously something I'm missing when using transformed variables in R's formula format. Why doesn't this work? You're looking for the function I so that the ^ operator is treated as arithmetic in the

Spark ml and PMML export

阅读更多关于 Spark ml and PMML export

I know that it's possible to export models as PMML with Spark-MLlib , but what about Spark-ML ? Is it possible to convert LinearRegressionModel from org.apache.spark.ml.regression to a LinearRegressionModel from org.apache.spark.mllib.regression to be able to invoke the toPMML() method? You can convert Spark ML pipelines to PMML using the JPMML-SparkML library: StructType schema = dataFrame.schema() PipelineModel pipelineModel = pipeline.fit(dataFrame); org.dmg.pmml.PMML pmml = org.jpmml.sparkml.ConverterUtil.toPMML(schema, pipelineModel); JAXBUtil.marshalPMML(pmml, new StreamResult(System.out

How to compute AIC for linear regression model in Python?

阅读更多关于 How to compute AIC for linear regression model in Python?

I want to compute AIC for linear models to compare their complexity. I did it as follows: regr = linear_model.LinearRegression() regr.fit(X, y) aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1) def aic(y, y_pred, k): resid = y - y_pred.ravel() sse = sum(resid ** 2) AIC = 2*k - 2*np.log(sse) return AIC But I receive a divide by zero encountered in log error. sklearn 's LinearRegression is good for prediction but pretty barebones as you've discovered. (It's often said that sklearn stays away from all things statistical inference.) statsmodels.regression.linear

Why does my linear regression fit line look wrong?

阅读更多关于 Why does my linear regression fit line look wrong?

I have plotted a 2-D histogram in a way that I can add to the plot with lines, points etc. Now I seek to apply a linear regression fit at the region of dense points, however my linear regression line seems totally off where it should be? To demonstrate here is my plot on the left with both a lowess regression fit and linear fit. lines(lowess(na.omit(a),na.omit(b),iter=10),col='gray',lwd=3) abline(lm(b[cc]~a[cc]),lwd=3) Here a and b are my values and cc are the points within the densest parts (i.e. most points lay there), red+yellow+blue. Why doesn't my regression line look more like that on

Errors in segmented package: breakpoints confusion

阅读更多关于 Errors in segmented package: breakpoints confusion

问题 Using the segmented package to create a piecewise linear regression I am seeing an error when I try to set my own breakpoints; it seems only when I try to set more than two. (EDIT) Here is the code I am using: # data bullard <- structure(list(Rt = c(0, 4.0054, 25.1858, 27.9998, 35.7259, 39.0769, 45.1805, 45.6717, 48.3419, 51.5661, 64.1578, 66.828, 111.1613, 114.2518, 121.8681, 146.0591, 148.8134, 164.6219, 176.522, 177.9578, 180.8773, 187.1846, 210.5131, 211.483, 230.2598, 262.3549, 266.2318,

How to get the P Value in a Variable from OLSResults in Python?

阅读更多关于 How to get the P Value in a Variable from OLSResults in Python?

The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary()) shows the P values of each attribute to only 3 decimal places. I need to extract the p value for each attribute like Distance , CarrierNum etc. and print it in scientific notation. I can extract the coefficients using fit.params[0] or fit.params[1] etc. Need to get it for all their P values. Also what does all P values being 0 mean? We've to do fit.pvalues[i] to get the answer where i is the