linear-regression | 易学教程

How to compute AIC for linear regression model in Python?

阅读更多关于 How to compute AIC for linear regression model in Python?

问题 I want to compute AIC for linear models to compare their complexity. I did it as follows: regr = linear_model.LinearRegression() regr.fit(X, y) aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1) def aic(y, y_pred, k): resid = y - y_pred.ravel() sse = sum(resid ** 2) AIC = 2*k - 2*np.log(sse) return AIC But I receive a divide by zero encountered in log error. 回答1: sklearn 's LinearRegression is good for prediction but pretty barebones as you've discovered. (It's

Difference between categorical variables (factors) and dummy variables

阅读更多关于 Difference between categorical variables (factors) and dummy variables

问题 I was running a regression using categorical variables and came across this question. Here, the user wanted to add a column for each dummy. This left me quite confused because I though having long data with the column including all the dummies stored using as.factor() was equivalent to having dummy variables. Could someone explain the difference between the following two linear regression models? Linear Model 1, where Month is a factor: dt_long Sales Period Month 1: 0.4898943 1 M1 2: 0

Parallelising gradient calculation in Julia

阅读更多关于 Parallelising gradient calculation in Julia

问题 I was persuaded some time ago to drop my comfortable matlab programming and start programming in Julia. I have been working for a long with neural networks and I thought that, now with Julia, I could get things done faster by parallelising the calculation of the gradient. The gradient need not be calculated on the entire dataset in one go; instead one can split the calculation. For instance, by splitting the dataset in parts, we can calculate a partial gradient on each part. The total

Create lm object from data/coefficients

阅读更多关于 Create lm object from data/coefficients

问题 Does anyone know of a function that can create an lm object given a dataset and coefficients? I'm interested in this because I started playing with Bayesian model averaging (BMA) and I'd like to be able to create an lm object out of the results of bicreg. I'd like to have access to all of the nice generic lm functions like diagnostic plotting, predict, cv.lm etc. If you are pretty sure such a function doesn't exist that's also very helpful to know! library(BMA) mtcars_y <- mtcars[, 1] #mpg

How to force zero interception in linear regression?

阅读更多关于 How to force zero interception in linear regression?

问题 I'm a bit of a newby so apologies if this question has already been answered, I've had a look and couldn't find specifically what I was looking for. I have some more or less linear data of the form x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0] y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 3.1746209003398689, 4.2905482471260044, 6.2816226678076958, 11.073788414382639, 23.248479770546009, 32.120462301367183, 44.036117671229206, 54

Converting Numpy Lstsq residual value to R^2

阅读更多关于 Converting Numpy Lstsq residual value to R^2

问题 I am performing a least squares regression as below (univariate). I would like to express the significance of the result in terms of R^2. Numpy returns a value of unscaled residual, what would be a sensible way of normalizing this. field_clean,back_clean = rid_zeros(backscatter,field_data) num_vals = len(field_clean) x = field_clean[:,row:row+1] y = 10*log10(back_clean) A = hstack([x, ones((num_vals,1))]) soln = lstsq(A, y ) m, c = soln [0] residues = soln [1] print residues 回答1: See http:/

Conditionally colour data points outside of confidence bands in R

阅读更多关于 Conditionally colour data points outside of confidence bands in R

问题 I need to colour datapoints that are outside of the the confidence bands on the plot below differently from those within the bands. Should I add a separate column to my dataset to record whether the data points are within the confidence bands? Can you provide an example please? Example dataset: ## Dataset from http://www.apsnet.org/education/advancedplantpath/topics/RModules/doc1/04_Linear_regression.html ## Disease severity as a function of temperature # Response variable, disease severity

OLS using statsmodel.formula.api versus statsmodel.api

阅读更多关于 OLS using statsmodel.formula.api versus statsmodel.api

问题 Can anyone explain to me the difference between ols in statsmodel.formula.api versus ols in statsmodel.api? Using the Advertising data from the ISLR text, I ran an ols using both, and got different results. I then compared with scikit-learn's LinearRegression. import numpy as np import pandas as pd import statsmodels.formula.api as smf import statsmodels.api as sm from sklearn.linear_model import LinearRegression df = pd.read_csv("C:\...\Advertising.csv") x1 = df.loc[:,['TV']] y1 = df.loc[:,[

Using a smoother with the L Method to determine the number of K-Means clusters

阅读更多关于 Using a smoother with the L Method to determine the number of K-Means clusters

问题 Has anyone tried to apply a smoother to the evaluation metric before applying the L-method to determine the number of k-means clusters in a dataset? If so, did it improve the results? Or allow a lower number of k-means trials and hence much greater increase in speed? Which smoothing algorithm/method did you use? The "L-Method" is detailed in: Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, Salvador & Chan This calculates the evaluation metric

How to move the train model to production?

阅读更多关于 How to move the train model to production?

问题 I have finalized a model and it is performing within acceptable limits. I am using python and scitkit-learn specifically. Next is to move the model to production. May I request help to move these models to production. How can I save a trained model in such a way that I can move it to production. Thanks in advance for help. 回答1: As the commentor suggested, you should use pickle . Specifically for ML, what you're looking for is Model persistence. And with scikit-learn: After training a scikit