linear-regression

How to compute AIC for linear regression model in Python?

醉酒当歌 提交于 2020-01-02 02:42:08
问题 I want to compute AIC for linear models to compare their complexity. I did it as follows: regr = linear_model.LinearRegression() regr.fit(X, y) aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1) def aic(y, y_pred, k): resid = y - y_pred.ravel() sse = sum(resid ** 2) AIC = 2*k - 2*np.log(sse) return AIC But I receive a divide by zero encountered in log error. 回答1: sklearn 's LinearRegression is good for prediction but pretty barebones as you've discovered. (It's

Difference between categorical variables (factors) and dummy variables

徘徊边缘 提交于 2020-01-01 12:00:15
问题 I was running a regression using categorical variables and came across this question. Here, the user wanted to add a column for each dummy. This left me quite confused because I though having long data with the column including all the dummies stored using as.factor() was equivalent to having dummy variables. Could someone explain the difference between the following two linear regression models? Linear Model 1, where Month is a factor: dt_long Sales Period Month 1: 0.4898943 1 M1 2: 0

Parallelising gradient calculation in Julia

梦想的初衷 提交于 2020-01-01 09:18:26
问题 I was persuaded some time ago to drop my comfortable matlab programming and start programming in Julia. I have been working for a long with neural networks and I thought that, now with Julia, I could get things done faster by parallelising the calculation of the gradient. The gradient need not be calculated on the entire dataset in one go; instead one can split the calculation. For instance, by splitting the dataset in parts, we can calculate a partial gradient on each part. The total

Create lm object from data/coefficients

拜拜、爱过 提交于 2020-01-01 04:54:07
问题 Does anyone know of a function that can create an lm object given a dataset and coefficients? I'm interested in this because I started playing with Bayesian model averaging (BMA) and I'd like to be able to create an lm object out of the results of bicreg. I'd like to have access to all of the nice generic lm functions like diagnostic plotting, predict, cv.lm etc. If you are pretty sure such a function doesn't exist that's also very helpful to know! library(BMA) mtcars_y <- mtcars[, 1] #mpg

How to force zero interception in linear regression?

…衆ロ難τιáo~ 提交于 2020-01-01 04:15:07
问题 I'm a bit of a newby so apologies if this question has already been answered, I've had a look and couldn't find specifically what I was looking for. I have some more or less linear data of the form x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0] y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 3.1746209003398689, 4.2905482471260044, 6.2816226678076958, 11.073788414382639, 23.248479770546009, 32.120462301367183, 44.036117671229206, 54

Converting Numpy Lstsq residual value to R^2

[亡魂溺海] 提交于 2019-12-31 22:03:55
问题 I am performing a least squares regression as below (univariate). I would like to express the significance of the result in terms of R^2. Numpy returns a value of unscaled residual, what would be a sensible way of normalizing this. field_clean,back_clean = rid_zeros(backscatter,field_data) num_vals = len(field_clean) x = field_clean[:,row:row+1] y = 10*log10(back_clean) A = hstack([x, ones((num_vals,1))]) soln = lstsq(A, y ) m, c = soln [0] residues = soln [1] print residues 回答1: See http:/

Conditionally colour data points outside of confidence bands in R

风流意气都作罢 提交于 2019-12-31 10:49:26
问题 I need to colour datapoints that are outside of the the confidence bands on the plot below differently from those within the bands. Should I add a separate column to my dataset to record whether the data points are within the confidence bands? Can you provide an example please? Example dataset: ## Dataset from http://www.apsnet.org/education/advancedplantpath/topics/RModules/doc1/04_Linear_regression.html ## Disease severity as a function of temperature # Response variable, disease severity

OLS using statsmodel.formula.api versus statsmodel.api

南笙酒味 提交于 2019-12-31 10:35:26
问题 Can anyone explain to me the difference between ols in statsmodel.formula.api versus ols in statsmodel.api? Using the Advertising data from the ISLR text, I ran an ols using both, and got different results. I then compared with scikit-learn's LinearRegression. import numpy as np import pandas as pd import statsmodels.formula.api as smf import statsmodels.api as sm from sklearn.linear_model import LinearRegression df = pd.read_csv("C:\...\Advertising.csv") x1 = df.loc[:,['TV']] y1 = df.loc[:,[

Using a smoother with the L Method to determine the number of K-Means clusters

霸气de小男生 提交于 2019-12-31 09:02:21
问题 Has anyone tried to apply a smoother to the evaluation metric before applying the L-method to determine the number of k-means clusters in a dataset? If so, did it improve the results? Or allow a lower number of k-means trials and hence much greater increase in speed? Which smoothing algorithm/method did you use? The "L-Method" is detailed in: Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, Salvador & Chan This calculates the evaluation metric

How to move the train model to production?

≡放荡痞女 提交于 2019-12-31 04:03:10
问题 I have finalized a model and it is performing within acceptable limits. I am using python and scitkit-learn specifically. Next is to move the model to production. May I request help to move these models to production. How can I save a trained model in such a way that I can move it to production. Thanks in advance for help. 回答1: As the commentor suggested, you should use pickle . Specifically for ML, what you're looking for is Model persistence. And with scikit-learn: After training a scikit