regression | 易学教程

Multivariate polynomial regression with numpy

阅读更多关于 Multivariate polynomial regression with numpy

问题 I have many samples (y_i, (a_i, b_i, c_i)) where y is presumed to vary as a polynomial in a,b,c up to a certain degree. For example for a given set of data and degree 2 I might produce the model y = a^2 + 2ab - 3cb + c^2 +.5ac This can be done using least squares and is a slight extension of numpy's polyfit routine. Is there a standard implementation somewhere in the Python ecosystem? 回答1: sklearn provides a simple way to do this. Building off an example posted here: #X is the independent

lm(): What is qraux returned by QR decomposition in LINPACK / LAPACK

阅读更多关于 lm(): What is qraux returned by QR decomposition in LINPACK / LAPACK

问题 rich.main3 is a linear model in R. I understand the rest of the elements of the list but I don't get what qraux is. The documentation states that it is a vector of length ncol(x) which contains additional information on \bold{Q}". What additional information does it mean? str(rich.main3$qr) qr : num [1:164, 1:147] -12.8062 0.0781 0.0781 0.0781 0.0781 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:164] "1" "2" "3" "4" ... .. ..$ : chr [1:147] "(Intercept)" "S2" "S3" "x1" ... ..- attr(*,

predict.lm() with an unknown factor level in test data

阅读更多关于 predict.lm() with an unknown factor level in test data

I am fitting a model to factor data and predicting. If the newdata in predict.lm() contains a single factor level that is unknown to the model, all of predict.lm() fails and returns an error. Is there a good way to have predict.lm() return a prediction for those factor levels the model knows and NA for unknown factor levels, instead of only an error? Example code: foo <- data.frame(response=rnorm(3),predictor=as.factor(c("A","B","C"))) model <- lm(response~predictor,foo) foo.new <- data.frame(predictor=as.factor(c("A","B","C","D"))) predict(model,newdata=foo.new) I would like the very last

Exponential regression in R

阅读更多关于 Exponential regression in R

问题 I have some points that look like a logarithmic curve. The curve that I'm trying to obtain look like: y = a * exp(-b*x) + c My code: x <- c(1.564379666,1.924250092,2.041559879,2.198696382,2.541267447,2.666400433,2.922534874,2.965726615,3.009969443,3.248480245,3.32927682,3.371404563,3.423759668,3.713001284,3.841419166,3.847632349,3.947993339,4.024541136,4.030779671,4.118849343,4.154008445,4.284232251,4.491359108,4.585182188,4.643299476,4.643299476,4.643299476,4.684369939,4.84424144,4.867973977

Export fitted regression splines (constructed by 'bs' or 'ns') as piecewise polynomials

阅读更多关于 Export fitted regression splines (constructed by 'bs' or 'ns') as piecewise polynomials

问题 Take for instance the following one-knot, degree two, spline: library(splines) library(ISLR) fit.spline <- lm(wage~bs(age, knots=c(42), degree=2), data=Wage) summary(fit.spline) I see estimates that I don't expect. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 57.349 3.950 14.518 < 2e-16 *** bs(age, knots = c(42), degree = 2)1 59.511 5.786 10.285 < 2e-16 *** bs(age, knots = c(42), degree = 2)2 65.722 4.076 16.122 < 2e-16 *** bs(age, knots = c(42), degree = 2)3 37.170 9.722 3

Find p-value (significance) in scikit-learn LinearRegression

阅读更多关于 Find p-value (significance) in scikit-learn LinearRegression

How can I find the p-value (significance) of each coefficient? lm = sklearn.linear_model.LinearRegression() lm.fit(x,y) This is kind of overkill but let's give it a go. First lets use statsmodel to find out what the p-values should be import pandas as pd import numpy as np from sklearn import datasets, linear_model from sklearn.linear_model import LinearRegression import statsmodels.api as sm from scipy import stats diabetes = datasets.load_diabetes() X = diabetes.data y = diabetes.target X2 = sm.add_constant(X) est = sm.OLS(y, X2) est2 = est.fit() print(est2.summary()) and we get OLS

R: lm() result differs when using `weights` argument and when using manually reweighted data

阅读更多关于 R: lm() result differs when using `weights` argument and when using manually reweighted data

问题 In order to correct heteroskedasticity in error terms, I am running the following weighted least squares regression in R : #Call: #lm(formula = a ~ q + q2 + b + c, data = mydata, weights = weighting) #Weighted Residuals: # Min 1Q Median 3Q Max #-1.83779 -0.33226 0.02011 0.25135 1.48516 #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -3.939440 0.609991 -6.458 1.62e-09 *** #q 0.175019 0.070101 2.497 0.013696 * #q2 0.048790 0.005613 8.693 8.49e-15 *** #b 0.473891 0.134918 3

Quadratic and cubic regression in Excel

阅读更多关于 Quadratic and cubic regression in Excel

问题 I have the following information: Height Weight 170 65 167 55 189 85 175 70 166 55 174 55 169 69 170 58 184 84 161 56 170 75 182 68 167 51 187 85 178 62 173 60 172 68 178 55 175 65 176 70 I want to construct quadratic and cubic regression analysis in Excel. I know how to do it by linear regression in Excel, but what about quadratic and cubic? I have searched a lot of resources, but could not find anything helpful. 回答1: You need to use an undocumented trick with Excel's LINEST function:

Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

阅读更多关于 Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

问题 This question relates to a machine learning feature selection procedure. I have a large matrix of features - columns are the features of the subjects (rows): set.seed(1) features.mat <- matrix(rnorm(10*100),ncol=100) colnames(features.mat) <- paste("F",1:100,sep="") rownames(features.mat) <- paste("S",1:10,sep="") The response was measured for each subject ( S ) under different conditions ( C ) and therefore looks like this: response.df <- data.frame(S = c(sapply(1:10, function(x) rep(paste(

Ridge regression with `glmnet` gives different coefficients than what I compute by “textbook definition”?

阅读更多关于 Ridge regression with `glmnet` gives different coefficients than what I compute by “textbook definition”?

问题 I am running Ridge regression with the use of glmnet R package. I noticed that the coefficients I obtain from glmnet::glmnet function are different from those I get by computing coefficients by definition (with the use of the same lambda value). Could somebody explain me why? Data (both: response Y and design matrix X ) are scaled. library(MASS) library(glmnet) # Data dimensions p.tmp <- 100 n.tmp <- 100 # Data objects set.seed(1) X <- scale(mvrnorm(n.tmp, mu = rep(0, p.tmp), Sigma = diag(p