linear-regression | 易学教程

How to put a complicated equation into a R formula?

阅读更多关于 How to put a complicated equation into a R formula?

问题 We have the diameter of trees as the predictor and tree height as the dependent variable. A number of different equations exist for this kind of data and we try to model some of them and compare the results. However, we we can't figure out how to correctly put one equation into the corresponding R formula format. The trees data set in R can be used as an example. data(trees) df <- trees df$h <- df$Height * 0.3048 #transform to metric system df$dbh <- (trees$Girth * 0.3048) / pi #transform

Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

阅读更多关于 Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

This question relates to a machine learning feature selection procedure. I have a large matrix of features - columns are the features of the subjects (rows): set.seed(1) features.mat <- matrix(rnorm(10*100),ncol=100) colnames(features.mat) <- paste("F",1:100,sep="") rownames(features.mat) <- paste("S",1:10,sep="") The response was measured for each subject ( S ) under different conditions ( C ) and therefore looks like this: response.df <- data.frame(S = c(sapply(1:10, function(x) rep(paste("S", x, sep = ""),100))), C = rep(paste("C", 1:100, sep = ""), 10), response = rnorm(1000),

Model matrix with all pairwise interactions between columns

阅读更多关于 Model matrix with all pairwise interactions between columns

问题 Let's say that I have a numeric data matrix with columns w, x, y, z and I also want to add in the columns that are equivalent to w*x, w*y, w*z, x*y, x*z, y*z since I want my covariate matrix to include all pairwise interactions. Is there a clean and effective way to do this? 回答1: If you mean in a model formula , then the ^ operator does this. ## dummy data set.seed(1) dat <- data.frame(Y = rnorm(10), x = rnorm(10), y = rnorm(10), z = rnorm(10)) The formula is form <- Y ~ (x + y + z)^2 which

Aligning Data frame with missing values

阅读更多关于 Aligning Data frame with missing values

I'm using a data frame with many NA values. While I'm able to create a linear model, I am subsequently unable to line the fitted values of the model up with the original data due to the missing values and lack of indicator column. Here's a reproducible example: library(MASS) dat <- Aids2 # Add NA's dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA # Create a model model <- lm(death ~ diag + age, data = dat) # Different Values length(fitted.values(model)) # 2745 nrow(dat) # 2843 There are actually three solutions here: pad NA to fitted values ourselves; use predict() to compute fitted

How to do gaussian/polynomial regression with scikit-learn?

阅读更多关于 How to do gaussian/polynomial regression with scikit-learn?

问题 Does scikit-learn provide facility to perform regression using a gaussian or polynomial kernel? I looked at the APIs and I don't see any. Has anyone built a package on top of scikit-learn that does this? 回答1: Either you use Support Vector Regression sklearn.svm.SVR and set the appropritate kernel (see here). Or you install the latest master version of sklearn and use the recently added sklearn.preprocessing.PolynomialFeatures (see here) and then OLS or Ridge on top of that. 回答2: Theory

How to calculate variance of least squares estimator using QR decomposition in R?

阅读更多关于 How to calculate variance of least squares estimator using QR decomposition in R?

问题 I'm trying to learn QR decomposition, but can't figure out how to get the variance of beta_hat without resorting to traditional matrix calculations. I'm practising with the iris data set, and here's what I have so far: y<-(iris$Sepal.Length) x<-(iris$Sepal.Width) X<-cbind(1,x) n<-nrow(X) p<-ncol(X) qr.X<-qr(X) b<-(t(qr.Q(qr.X)) %*% y)[1:p] R<-qr.R(qr.X) beta<-as.vector(backsolve(R,b)) res<-as.vector(y-X %*% beta) Thanks for your help! 回答1: setup (copying in your code) y <- iris$Sepal.Length x

scikit-learn & statsmodels - which R-squared is correct?

阅读更多关于 scikit-learn & statsmodels - which R-squared is correct?

I'd like to choose the best algorithm for future. I found some solutions, but I didn't understand which R-Squared value is correct. For this, I divided my data into two as test and training, and I printed two different R squared values below. import statsmodels.api as sm from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score lineer = LinearRegression() lineer.fit(x_train,y_train) lineerPredict = lineer.predict(x_test) scoreLineer = r2_score(y_test, lineerPredict) # First R-Squared model = sm.OLS(lineerPredict, y_test) print(model.fit().summary()) # Second R

Pandas rolling regression: alternatives to looping

阅读更多关于 Pandas rolling regression: alternatives to looping

问题 I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS regression in an efficient manner has been asked several times (here, for instance), but phrased a little broadly and left without a great answer, in my view. Here are my questions: How can I best mimic the basic framework of pandas' MovingOLS ? The most attractive feature of this class was the

TensorFlow: “Attempting to use uninitialized value” in variable initialization

阅读更多关于 TensorFlow: “Attempting to use uninitialized value” in variable initialization

问题 I am trying to implement multivariate linear regression in Python using TensorFlow, but have run into some logical and implementation issues. My code throws the following error: Attempting to use uninitialized value Variable Caused by op u'Variable/read' Ideally the weights output should be [2, 3] def hypothesis_function(input_2d_matrix_trainingexamples, output_matrix_of_trainingexamples, initial_parameters_of_hypothesis_function, learning_rate, num_steps): # calculate num attributes and num

How (and why) do you use contrasts?

阅读更多关于 How (and why) do you use contrasts?

Under what cases do you create contrasts in your analysis? How is it done and what is it used for? I checked ?contrasts and ?C - both lead to "Chapter 2 of Statistical Models in S", which is not readily available to me. Contrasts are needed when you fit linear models with factors (i.e. categorical variables) as explanatory variables. The contrast specifies how the levels of the factors will be coded into a family of numeric dummy variables for fitting the model. Here are some good notes for the different varieties of contrasts used: http://www.unc.edu/courses/2006spring/ecol/145/001/docs