regression

Plotting confidence and prediction intervals with repeated entries

坚强是说给别人听的谎言 提交于 2019-11-29 02:34:10
I have a correlation plot for two variables, the predictor variable (temperature) on the x-axis, and the response variable (density) on the y-axis. My best fit least squares regression line is a 2nd order polynomial. I would like to also plot confidence and prediction intervals. The method described in this answer seems perfect. However, my dataset (n=2340) has repeated entries for many (x,y) pairs. My resulting plot looks like this: Here is my relevant code (slightly modified from linked answer above): import numpy as np import pandas as pd import matplotlib.pyplot as plt from statsmodels

python stats models - quadratic term in regression

给你一囗甜甜゛ 提交于 2019-11-29 02:32:54
问题 I have the following linear regression: import statsmodels.formula.api as sm model = sm.ols(formula = 'a ~ b + c', data = data).fit() I want to add a quadratic term for b in this model. Is there a simple way to do this with statsmodels.ols? Is there a better package I should be using to achieve this? 回答1: Although the solution by Alexander is working, in some situations it is not very convenient. For example, each time you want to predict the outcome of the model for new values, you need to

Model matrix with all pairwise interactions between columns

落花浮王杯 提交于 2019-11-29 02:17:18
Let's say that I have a numeric data matrix with columns w, x, y, z and I also want to add in the columns that are equivalent to w*x, w*y, w*z, x*y, x*z, y*z since I want my covariate matrix to include all pairwise interactions. Is there a clean and effective way to do this? If you mean in a model formula , then the ^ operator does this. ## dummy data set.seed(1) dat <- data.frame(Y = rnorm(10), x = rnorm(10), y = rnorm(10), z = rnorm(10)) The formula is form <- Y ~ (x + y + z)^2 which gives (using model.matrix() - which is used internally by the standard model fitting functions) model.matrix

Partial Least Squares Library

主宰稳场 提交于 2019-11-29 00:48:31
问题 There was already a question like this, but it was not answered, so I try to post it again. Does anyone know of an open-source implementation of a partial least squares algorithm in C++ (or C)? Or maybe a library that does it? 回答1: FastPLS is a library that provides a C/C++ and MATLAB interface for speeding up partial least squares. Its author is Balaji Vasan Srinivasan. The author worked under the supervision of Professor Ramani Duraiswami at the University of Maryland, College Park, MD, USA

How to calculate variance of least squares estimator using QR decomposition in R?

♀尐吖头ヾ 提交于 2019-11-29 00:18:16
I'm trying to learn QR decomposition, but can't figure out how to get the variance of beta_hat without resorting to traditional matrix calculations. I'm practising with the iris data set, and here's what I have so far: y<-(iris$Sepal.Length) x<-(iris$Sepal.Width) X<-cbind(1,x) n<-nrow(X) p<-ncol(X) qr.X<-qr(X) b<-(t(qr.Q(qr.X)) %*% y)[1:p] R<-qr.R(qr.X) beta<-as.vector(backsolve(R,b)) res<-as.vector(y-X %*% beta) Thanks for your help! setup (copying in your code) y <- iris$Sepal.Length x <- iris$Sepal.Width X <- cbind(1,x) n <- nrow(X) p <- ncol(X) qr.X <- qr(X) b <- (t(qr.Q(qr.X)) %*% y)[1:p]

How to interpret lm() coefficient estimates when using bs() function for splines

假如想象 提交于 2019-11-29 00:04:00
I'm using a set of points which go from (-5,5) to (0,0) and (5,5) in a "symmetric V-shape". I'm fitting a model with lm() and the bs() function to fit a "V-shape" spline: lm(formula = y ~ bs(x, degree = 1, knots = c(0))) I get the "V-shape" when I predict outcomes by predict() and draw the prediction line. But when I look at the model estimates coef() , I see estimates that I don't expect. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.93821 0.16117 30.639 1.40e-09 *** bs(x, degree = 1, knots = c(0))1 -5.12079 0.24026 -21.313 2.47e-08 *** bs(x, degree = 1, knots = c(0))2 -0

Stepwise Regression in Python

南楼画角 提交于 2019-11-28 22:31:04
问题 How to perform stepwise regression in python ? There are methods for OLS in SCIPY but I am not able to do stepwise. Any help in this regard would be a great help. Thanks. Edit: I am trying to build a linear regression model. I have 5 independent variables and using forward stepwise regression, I aim to select variables such that my model has the lowest p-value. Following link explains the objective: https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0CEAQFjAD&url=http%3A%2F

Logistic regression with robust clustered standard errors in R

こ雲淡風輕ζ 提交于 2019-11-28 20:54:17
A newbie question: does anyone know how to run a logistic regression with clustered standard errors in R? In Stata it's just logit Y X1 X2 X3, vce(cluster Z) , but unfortunately I haven't figured out how to do the same analysis in R. Thanks in advance! David F You might want to look at the rms (regression modelling strategies) package. So, lrm is logistic regression model, and if fit is the name of your output, you'd have something like this: fit=lrm(disease ~ age + study + rcs(bmi,3), x=T, y=T, data=dataf) fit robcov(fit, cluster=dataf$id) bootcov(fit,cluster=dataf$id) You have to specify x=T

PCA first or normalization first?

左心房为你撑大大i 提交于 2019-11-28 19:25:48
When doing regression or classification, what is the correct (or better) way to preprocess the data? Normalize the data -> PCA -> training PCA -> normalize PCA output -> training Normalize the data -> PCA -> normalize PCA output -> training Which of the above is more correct, or is the "standardized" way to preprocess the data? By "normalize" I mean either standardization, linear scaling or some other techniques. Chris Taylor You should normalize the data before doing PCA. For example, consider the following situation. I create a data set X with a known correlation matrix C : >> C = [1 0.5; 0

Multiple outputs in Keras

浪子不回头ぞ 提交于 2019-11-28 19:20:41
I have a problem which deals with predicting two outputs when given a vector of predictors. Assume that a predictor vector looks like x1, y1, att1, att2, ..., attn , which says x1, y1 are coordinates and att's are the other attributes attached to the occurrence of x1, y1 coordinates. Based on this predictor set I want to predict x2, y2 . This is a time series problem, which I am trying to solve using multiple regresssion. My question is how do I setup keras, which can give me 2 outputs in the final layer. I have solved simple regression problem in keras and the code is avaialable in my github