linear-regression

Model matrix with all pairwise interactions between columns

落花浮王杯 提交于 2019-11-29 02:17:18
Let's say that I have a numeric data matrix with columns w, x, y, z and I also want to add in the columns that are equivalent to w*x, w*y, w*z, x*y, x*z, y*z since I want my covariate matrix to include all pairwise interactions. Is there a clean and effective way to do this? If you mean in a model formula , then the ^ operator does this. ## dummy data set.seed(1) dat <- data.frame(Y = rnorm(10), x = rnorm(10), y = rnorm(10), z = rnorm(10)) The formula is form <- Y ~ (x + y + z)^2 which gives (using model.matrix() - which is used internally by the standard model fitting functions) model.matrix

6th degree curve fitting with numpy/scipy

血红的双手。 提交于 2019-11-29 01:43:15
问题 I have a very specific requirement for interpolating nonlinear data using a 6th degree polynomial. I've seen numpy/scipy routines (scipy.interpolate.InterpolatedUnivariateSpline) that allow interpolation only up to degree 5. Even if there's no direct function to do this, is there a way to replicate Excel's LINEST linear regression algorithm in Python? LINEST allows 6th degree curve-fitting but I do NOT want to use Excel for anything as this calculation is part of a much larger Python script.

3D Linear Regression

风格不统一 提交于 2019-11-29 01:36:59
问题 I want to write a program that, given a list of points in 3D-space, represented as an array of x,y,z coordinates in floating point, outputs a best-fit line in this space. The line can/should be in the form of a unit vector and a point on the line. The problem is that I don't know how this is to be done. The closest thing I found was this link, though quite honestly I did not understand how he went from equation to equation and by the time we got to matrices I was pretty lost. Is there a

How to calculate variance of least squares estimator using QR decomposition in R?

♀尐吖头ヾ 提交于 2019-11-29 00:18:16
I'm trying to learn QR decomposition, but can't figure out how to get the variance of beta_hat without resorting to traditional matrix calculations. I'm practising with the iris data set, and here's what I have so far: y<-(iris$Sepal.Length) x<-(iris$Sepal.Width) X<-cbind(1,x) n<-nrow(X) p<-ncol(X) qr.X<-qr(X) b<-(t(qr.Q(qr.X)) %*% y)[1:p] R<-qr.R(qr.X) beta<-as.vector(backsolve(R,b)) res<-as.vector(y-X %*% beta) Thanks for your help! setup (copying in your code) y <- iris$Sepal.Length x <- iris$Sepal.Width X <- cbind(1,x) n <- nrow(X) p <- ncol(X) qr.X <- qr(X) b <- (t(qr.Q(qr.X)) %*% y)[1:p]

How to do gaussian/polynomial regression with scikit-learn?

三世轮回 提交于 2019-11-28 23:59:11
Does scikit-learn provide facility to perform regression using a gaussian or polynomial kernel? I looked at the APIs and I don't see any. Has anyone built a package on top of scikit-learn that does this? Either you use Support Vector Regression sklearn.svm.SVR and set the appropritate kernel (see here ). Or you install the latest master version of sklearn and use the recently added sklearn.preprocessing.PolynomialFeatures (see here ) and then OLS or Ridge on top of that. Salvador Dali Theory Polynomial regression is a special case of linear regression. With the main idea of how do you select

Pandas rolling regression: alternatives to looping

一世执手 提交于 2019-11-28 22:40:24
I got good use out of pandas' MovingOLS class (source here ) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS regression in an efficient manner has been asked several times ( here , for instance), but phrased a little broadly and left without a great answer, in my view. Here are my questions: How can I best mimic the basic framework of pandas' MovingOLS ? The most attractive feature of this class was the ability to view multiple methods/attributes as separate time series--i.e. coefficients, r-squared, t

How can I plot my R Squared value on my scatterplot using R?

99封情书 提交于 2019-11-28 20:37:24
This seems a simple question, so I hope its a simple answer. I am plotting my points and fitting a linear model, which I can do OK. I then want to plot some summary statistics, for example the R Squared value, on the plot also. I can only seem to get the R Squared value at the command line. Any advice; do I need to be looking at ggplot or anything else? Thanks in advance. #Does the plot plot(df$VAR1, df$VAR2) #Adds the line abline(lm(df$VAR2~df$VAR1), col="red") #Shows stats on command line summary(lm(df$VAR2~df$VAR1)) You can abuse legend() because it has the handy logical placement: R> DF <-

TensorFlow: “Attempting to use uninitialized value” in variable initialization

百般思念 提交于 2019-11-28 19:07:18
I am trying to implement multivariate linear regression in Python using TensorFlow, but have run into some logical and implementation issues. My code throws the following error: Attempting to use uninitialized value Variable Caused by op u'Variable/read' Ideally the weights output should be [2, 3] def hypothesis_function(input_2d_matrix_trainingexamples, output_matrix_of_trainingexamples, initial_parameters_of_hypothesis_function, learning_rate, num_steps): # calculate num attributes and num examples number_of_attributes = len(input_2d_matrix_trainingexamples[0]) number_of_trainingexamples =

How to Loop/Repeat a Linear Regression in R

怎甘沉沦 提交于 2019-11-28 19:01:03
I have figured out how to make a table in R with 4 variables, which I am using for multiple linear regressions. The dependent variable (Lung) for each regression is taken from one column of a csv table of 22,000 columns. One of the independent variables (Blood) is taken from a corresponding column of a similar table. Each column represents the levels of a particular gene, which is why there are so many of them. There are also two additional variables (Age and Gender of each patient). When I enter in the linear regression equation, I use lm(Lung[,1] ~ Blood[,1] + Age + Gender), which works for

Can scipy.stats identify and mask obvious outliers?

浪尽此生 提交于 2019-11-28 16:53:57
With scipy.stats.linregress I am performing a simple linear regression on some sets of highly correlated x,y experimental data, and initially visually inspecting each x,y scatter plot for outliers. More generally (i.e. programmatically) is there a way to identify and mask outliers? The statsmodels package has what you need. Look at this little code snippet and its output: # Imports # import statsmodels.api as smapi import statsmodels.graphics as smgraphics # Make data # x = range(30) y = [y*10 for y in x] # Add outlier # x.insert(6,15) y.insert(6,220) # Make graph # regression = smapi.OLS(x, y