linear-regression

Multiple Linear Regression with specific constraint on each coefficients on Python

删除回忆录丶 提交于 2019-12-05 16:29:52
I am currently running multiple linear regression on a dataset. At first, I didn't realize I needed to put constraints over my weights; as a matter of fact, I need to have specific positive & negative weights. To be more precise, I am doing a scoring system and this is why some of my variables should have a positive or negative impact on the note. Yet, when running my model, the results do not fit what I am expecting, some of my 'positive' variables get negative coefficients and vice versa. As an example, let's suppose my model is : y = W0*x0 + W1*x1 + W2*x2 Where x2 is a 'positive' variable,

How to compute minimal but fast linear regressions on each column of a response matrix?

回眸只為那壹抹淺笑 提交于 2019-12-05 11:56:58
I want to compute ordinary least square ( OLS ) estimates in R without using "lm" , and this for several reasons. First, "lm" also computes lots of stuff I don't need (such as the fitted values) considering that data size is an issue in my case. Second, I want to be able to implement OLS myself in R before doing it in another language (eg. in C with the GSL). As you may know, the model is: Y=Xb+E; with E ~ N(0, sigma^2). As detailed below, b is a vector with 2 parameters, the mean (b0) and another coefficients (b1). At the end, for each linear regression I will do, I want the estimate for b1

Multiple Linear Regression in Power BI

无人久伴 提交于 2019-12-05 11:15:28
Suppose I have a set of returns and I want to compute its beta values versus different market indices. Let's use the following set of data in a table named Returns for the sake of having a concrete example: Date Equity Duration Credit Manager ----------------------------------------------- 01/31/2017 2.907% 0.226% 1.240% 1.78% 02/28/2017 2.513% 0.493% 1.120% 3.88% 03/31/2017 1.346% -0.046% -0.250% 0.13% 04/30/2017 1.612% 0.695% 0.620% 1.04% 05/31/2017 2.209% 0.653% 0.480% 1.40% 06/30/2017 0.796% -0.162% 0.350% 0.63% 07/31/2017 2.733% 0.167% 0.830% 2.06% 08/31/2017 0.401% 1.083% -0.670% 0.29%

sklearn's PLSRegression: “ValueError: array must not contain infs or NaNs”

笑着哭i 提交于 2019-12-05 08:52:53
When using sklearn.cross_decomposition.PLSRegression : import numpy as np import sklearn.cross_decomposition pls2 = sklearn.cross_decomposition.PLSRegression() xx = np.random.random((5,5)) yy = np.zeros((5,5) ) yy[0,:] = [0,1,0,0,0] yy[1,:] = [0,0,0,1,0] yy[2,:] = [0,0,0,0,1] #yy[3,:] = [1,0,0,0,0] # Uncommenting this line solves the issue pls2.fit(xx, yy) I get: C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:44: RuntimeWarning: invalid value encountered in divide x_weights = np.dot(X.T, y_score) / np.dot(y_score.T, y_score) C:\Anaconda\lib\site-packages\sklearn\cross

Format of R's lm() Formula with a Transformation

醉酒当歌 提交于 2019-12-05 08:21:32
I can't quite figure out how to do the following in one line: data(attenu) x_temp = attenu$accel^(1/4) y_temp = log(attenu$dist) best_line = lm(y_temp ~ x_temp) Since the above works, I thought I could do the following: data(attenu) best_line = lm( log(attenu$dist) ~ (attenu$accel^(1/4)) ) But this gives the error: Error in terms.formula(formula, data = data) : invalid power in formula There's obviously something I'm missing when using transformed variables in R's formula format. Why doesn't this work? You're looking for the function I so that the ^ operator is treated as arithmetic in the

Spark ml and PMML export

寵の児 提交于 2019-12-05 04:43:19
I know that it's possible to export models as PMML with Spark-MLlib , but what about Spark-ML ? Is it possible to convert LinearRegressionModel from org.apache.spark.ml.regression to a LinearRegressionModel from org.apache.spark.mllib.regression to be able to invoke the toPMML() method? You can convert Spark ML pipelines to PMML using the JPMML-SparkML library: StructType schema = dataFrame.schema() PipelineModel pipelineModel = pipeline.fit(dataFrame); org.dmg.pmml.PMML pmml = org.jpmml.sparkml.ConverterUtil.toPMML(schema, pipelineModel); JAXBUtil.marshalPMML(pmml, new StreamResult(System.out

How to compute AIC for linear regression model in Python?

試著忘記壹切 提交于 2019-12-05 04:25:28
I want to compute AIC for linear models to compare their complexity. I did it as follows: regr = linear_model.LinearRegression() regr.fit(X, y) aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1) def aic(y, y_pred, k): resid = y - y_pred.ravel() sse = sum(resid ** 2) AIC = 2*k - 2*np.log(sse) return AIC But I receive a divide by zero encountered in log error. sklearn 's LinearRegression is good for prediction but pretty barebones as you've discovered. (It's often said that sklearn stays away from all things statistical inference.) statsmodels.regression.linear

Why does my linear regression fit line look wrong?

三世轮回 提交于 2019-12-05 03:55:21
I have plotted a 2-D histogram in a way that I can add to the plot with lines, points etc. Now I seek to apply a linear regression fit at the region of dense points, however my linear regression line seems totally off where it should be? To demonstrate here is my plot on the left with both a lowess regression fit and linear fit. lines(lowess(na.omit(a),na.omit(b),iter=10),col='gray',lwd=3) abline(lm(b[cc]~a[cc]),lwd=3) Here a and b are my values and cc are the points within the densest parts (i.e. most points lay there), red+yellow+blue. Why doesn't my regression line look more like that on

Errors in segmented package: breakpoints confusion

醉酒当歌 提交于 2019-12-05 01:41:59
问题 Using the segmented package to create a piecewise linear regression I am seeing an error when I try to set my own breakpoints; it seems only when I try to set more than two. (EDIT) Here is the code I am using: # data bullard <- structure(list(Rt = c(0, 4.0054, 25.1858, 27.9998, 35.7259, 39.0769, 45.1805, 45.6717, 48.3419, 51.5661, 64.1578, 66.828, 111.1613, 114.2518, 121.8681, 146.0591, 148.8134, 164.6219, 176.522, 177.9578, 180.8773, 187.1846, 210.5131, 211.483, 230.2598, 262.3549, 266.2318,

How to get the P Value in a Variable from OLSResults in Python?

一世执手 提交于 2019-12-05 01:38:01
The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary()) shows the P values of each attribute to only 3 decimal places. I need to extract the p value for each attribute like Distance , CarrierNum etc. and print it in scientific notation. I can extract the coefficients using fit.params[0] or fit.params[1] etc. Need to get it for all their P values. Also what does all P values being 0 mean? We've to do fit.pvalues[i] to get the answer where i is the