linear-regression

How to put a complicated equation into a R formula?

落爺英雄遲暮 提交于 2019-11-27 18:01:36
问题 We have the diameter of trees as the predictor and tree height as the dependent variable. A number of different equations exist for this kind of data and we try to model some of them and compare the results. However, we we can't figure out how to correctly put one equation into the corresponding R formula format. The trees data set in R can be used as an example. data(trees) df <- trees df$h <- df$Height * 0.3048 #transform to metric system df$dbh <- (trees$Girth * 0.3048) / pi #transform

Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

旧街凉风 提交于 2019-11-27 16:31:07
This question relates to a machine learning feature selection procedure. I have a large matrix of features - columns are the features of the subjects (rows): set.seed(1) features.mat <- matrix(rnorm(10*100),ncol=100) colnames(features.mat) <- paste("F",1:100,sep="") rownames(features.mat) <- paste("S",1:10,sep="") The response was measured for each subject ( S ) under different conditions ( C ) and therefore looks like this: response.df <- data.frame(S = c(sapply(1:10, function(x) rep(paste("S", x, sep = ""),100))), C = rep(paste("C", 1:100, sep = ""), 10), response = rnorm(1000),

Model matrix with all pairwise interactions between columns

对着背影说爱祢 提交于 2019-11-27 16:28:15
问题 Let's say that I have a numeric data matrix with columns w, x, y, z and I also want to add in the columns that are equivalent to w*x, w*y, w*z, x*y, x*z, y*z since I want my covariate matrix to include all pairwise interactions. Is there a clean and effective way to do this? 回答1: If you mean in a model formula , then the ^ operator does this. ## dummy data set.seed(1) dat <- data.frame(Y = rnorm(10), x = rnorm(10), y = rnorm(10), z = rnorm(10)) The formula is form <- Y ~ (x + y + z)^2 which

Aligning Data frame with missing values

荒凉一梦 提交于 2019-11-27 15:49:10
I'm using a data frame with many NA values. While I'm able to create a linear model, I am subsequently unable to line the fitted values of the model up with the original data due to the missing values and lack of indicator column. Here's a reproducible example: library(MASS) dat <- Aids2 # Add NA's dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA # Create a model model <- lm(death ~ diag + age, data = dat) # Different Values length(fitted.values(model)) # 2745 nrow(dat) # 2843 There are actually three solutions here: pad NA to fitted values ourselves; use predict() to compute fitted

How to do gaussian/polynomial regression with scikit-learn?

断了今生、忘了曾经 提交于 2019-11-27 15:20:36
问题 Does scikit-learn provide facility to perform regression using a gaussian or polynomial kernel? I looked at the APIs and I don't see any. Has anyone built a package on top of scikit-learn that does this? 回答1: Either you use Support Vector Regression sklearn.svm.SVR and set the appropritate kernel (see here). Or you install the latest master version of sklearn and use the recently added sklearn.preprocessing.PolynomialFeatures (see here) and then OLS or Ridge on top of that. 回答2: Theory

How to calculate variance of least squares estimator using QR decomposition in R?

坚强是说给别人听的谎言 提交于 2019-11-27 15:20:28
问题 I'm trying to learn QR decomposition, but can't figure out how to get the variance of beta_hat without resorting to traditional matrix calculations. I'm practising with the iris data set, and here's what I have so far: y<-(iris$Sepal.Length) x<-(iris$Sepal.Width) X<-cbind(1,x) n<-nrow(X) p<-ncol(X) qr.X<-qr(X) b<-(t(qr.Q(qr.X)) %*% y)[1:p] R<-qr.R(qr.X) beta<-as.vector(backsolve(R,b)) res<-as.vector(y-X %*% beta) Thanks for your help! 回答1: setup (copying in your code) y <- iris$Sepal.Length x

scikit-learn & statsmodels - which R-squared is correct?

◇◆丶佛笑我妖孽 提交于 2019-11-27 14:55:28
I'd like to choose the best algorithm for future. I found some solutions, but I didn't understand which R-Squared value is correct. For this, I divided my data into two as test and training, and I printed two different R squared values ​​below. import statsmodels.api as sm from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score lineer = LinearRegression() lineer.fit(x_train,y_train) lineerPredict = lineer.predict(x_test) scoreLineer = r2_score(y_test, lineerPredict) # First R-Squared model = sm.OLS(lineerPredict, y_test) print(model.fit().summary()) # Second R

Pandas rolling regression: alternatives to looping

柔情痞子 提交于 2019-11-27 13:29:49
问题 I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS regression in an efficient manner has been asked several times (here, for instance), but phrased a little broadly and left without a great answer, in my view. Here are my questions: How can I best mimic the basic framework of pandas' MovingOLS ? The most attractive feature of this class was the

TensorFlow: “Attempting to use uninitialized value” in variable initialization

此生再无相见时 提交于 2019-11-27 11:54:00
问题 I am trying to implement multivariate linear regression in Python using TensorFlow, but have run into some logical and implementation issues. My code throws the following error: Attempting to use uninitialized value Variable Caused by op u'Variable/read' Ideally the weights output should be [2, 3] def hypothesis_function(input_2d_matrix_trainingexamples, output_matrix_of_trainingexamples, initial_parameters_of_hypothesis_function, learning_rate, num_steps): # calculate num attributes and num

How (and why) do you use contrasts?

二次信任 提交于 2019-11-27 11:40:21
Under what cases do you create contrasts in your analysis? How is it done and what is it used for? I checked ?contrasts and ?C - both lead to "Chapter 2 of Statistical Models in S", which is not readily available to me. Contrasts are needed when you fit linear models with factors (i.e. categorical variables) as explanatory variables. The contrast specifies how the levels of the factors will be coded into a family of numeric dummy variables for fitting the model. Here are some good notes for the different varieties of contrasts used: http://www.unc.edu/courses/2006spring/ecol/145/001/docs