regression

Multivariate polynomial regression with numpy

心已入冬 提交于 2019-11-26 22:21:58
问题 I have many samples (y_i, (a_i, b_i, c_i)) where y is presumed to vary as a polynomial in a,b,c up to a certain degree. For example for a given set of data and degree 2 I might produce the model y = a^2 + 2ab - 3cb + c^2 +.5ac This can be done using least squares and is a slight extension of numpy's polyfit routine. Is there a standard implementation somewhere in the Python ecosystem? 回答1: sklearn provides a simple way to do this. Building off an example posted here: #X is the independent

lm(): What is qraux returned by QR decomposition in LINPACK / LAPACK

两盒软妹~` 提交于 2019-11-26 22:09:31
问题 rich.main3 is a linear model in R. I understand the rest of the elements of the list but I don't get what qraux is. The documentation states that it is a vector of length ncol(x) which contains additional information on \bold{Q}". What additional information does it mean? str(rich.main3$qr) qr : num [1:164, 1:147] -12.8062 0.0781 0.0781 0.0781 0.0781 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:164] "1" "2" "3" "4" ... .. ..$ : chr [1:147] "(Intercept)" "S2" "S3" "x1" ... ..- attr(*,

predict.lm() with an unknown factor level in test data

99封情书 提交于 2019-11-26 22:03:30
I am fitting a model to factor data and predicting. If the newdata in predict.lm() contains a single factor level that is unknown to the model, all of predict.lm() fails and returns an error. Is there a good way to have predict.lm() return a prediction for those factor levels the model knows and NA for unknown factor levels, instead of only an error? Example code: foo <- data.frame(response=rnorm(3),predictor=as.factor(c("A","B","C"))) model <- lm(response~predictor,foo) foo.new <- data.frame(predictor=as.factor(c("A","B","C","D"))) predict(model,newdata=foo.new) I would like the very last

Exponential regression in R

不羁岁月 提交于 2019-11-26 21:45:29
问题 I have some points that look like a logarithmic curve. The curve that I'm trying to obtain look like: y = a * exp(-b*x) + c My code: x <- c(1.564379666,1.924250092,2.041559879,2.198696382,2.541267447,2.666400433,2.922534874,2.965726615,3.009969443,3.248480245,3.32927682,3.371404563,3.423759668,3.713001284,3.841419166,3.847632349,3.947993339,4.024541136,4.030779671,4.118849343,4.154008445,4.284232251,4.491359108,4.585182188,4.643299476,4.643299476,4.643299476,4.684369939,4.84424144,4.867973977

Export fitted regression splines (constructed by 'bs' or 'ns') as piecewise polynomials

余生长醉 提交于 2019-11-26 21:27:56
问题 Take for instance the following one-knot, degree two, spline: library(splines) library(ISLR) fit.spline <- lm(wage~bs(age, knots=c(42), degree=2), data=Wage) summary(fit.spline) I see estimates that I don't expect. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 57.349 3.950 14.518 < 2e-16 *** bs(age, knots = c(42), degree = 2)1 59.511 5.786 10.285 < 2e-16 *** bs(age, knots = c(42), degree = 2)2 65.722 4.076 16.122 < 2e-16 *** bs(age, knots = c(42), degree = 2)3 37.170 9.722 3

Find p-value (significance) in scikit-learn LinearRegression

时光怂恿深爱的人放手 提交于 2019-11-26 21:21:00
How can I find the p-value (significance) of each coefficient? lm = sklearn.linear_model.LinearRegression() lm.fit(x,y) This is kind of overkill but let's give it a go. First lets use statsmodel to find out what the p-values should be import pandas as pd import numpy as np from sklearn import datasets, linear_model from sklearn.linear_model import LinearRegression import statsmodels.api as sm from scipy import stats diabetes = datasets.load_diabetes() X = diabetes.data y = diabetes.target X2 = sm.add_constant(X) est = sm.OLS(y, X2) est2 = est.fit() print(est2.summary()) and we get OLS

R: lm() result differs when using `weights` argument and when using manually reweighted data

家住魔仙堡 提交于 2019-11-26 20:22:33
问题 In order to correct heteroskedasticity in error terms, I am running the following weighted least squares regression in R : #Call: #lm(formula = a ~ q + q2 + b + c, data = mydata, weights = weighting) #Weighted Residuals: # Min 1Q Median 3Q Max #-1.83779 -0.33226 0.02011 0.25135 1.48516 #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -3.939440 0.609991 -6.458 1.62e-09 *** #q 0.175019 0.070101 2.497 0.013696 * #q2 0.048790 0.005613 8.693 8.49e-15 *** #b 0.473891 0.134918 3

Quadratic and cubic regression in Excel

时光总嘲笑我的痴心妄想 提交于 2019-11-26 18:49:33
问题 I have the following information: Height Weight 170 65 167 55 189 85 175 70 166 55 174 55 169 69 170 58 184 84 161 56 170 75 182 68 167 51 187 85 178 62 173 60 172 68 178 55 175 65 176 70 I want to construct quadratic and cubic regression analysis in Excel. I know how to do it by linear regression in Excel, but what about quadratic and cubic? I have searched a lot of resources, but could not find anything helpful. 回答1: You need to use an undocumented trick with Excel's LINEST function:

Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

我的梦境 提交于 2019-11-26 18:40:59
问题 This question relates to a machine learning feature selection procedure. I have a large matrix of features - columns are the features of the subjects (rows): set.seed(1) features.mat <- matrix(rnorm(10*100),ncol=100) colnames(features.mat) <- paste("F",1:100,sep="") rownames(features.mat) <- paste("S",1:10,sep="") The response was measured for each subject ( S ) under different conditions ( C ) and therefore looks like this: response.df <- data.frame(S = c(sapply(1:10, function(x) rep(paste(

Ridge regression with `glmnet` gives different coefficients than what I compute by “textbook definition”?

好久不见. 提交于 2019-11-26 18:33:06
问题 I am running Ridge regression with the use of glmnet R package. I noticed that the coefficients I obtain from glmnet::glmnet function are different from those I get by computing coefficients by definition (with the use of the same lambda value). Could somebody explain me why? Data (both: response Y and design matrix X ) are scaled. library(MASS) library(glmnet) # Data dimensions p.tmp <- 100 n.tmp <- 100 # Data objects set.seed(1) X <- scale(mvrnorm(n.tmp, mu = rep(0, p.tmp), Sigma = diag(p