linear-regression | 易学教程

Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

阅读更多关于 Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

问题 This question relates to a machine learning feature selection procedure. I have a large matrix of features - columns are the features of the subjects (rows): set.seed(1) features.mat <- matrix(rnorm(10*100),ncol=100) colnames(features.mat) <- paste("F",1:100,sep="") rownames(features.mat) <- paste("S",1:10,sep="") The response was measured for each subject ( S ) under different conditions ( C ) and therefore looks like this: response.df <- data.frame(S = c(sapply(1:10, function(x) rep(paste(

Ridge regression with `glmnet` gives different coefficients than what I compute by “textbook definition”?

阅读更多关于 Ridge regression with `glmnet` gives different coefficients than what I compute by “textbook definition”?

问题 I am running Ridge regression with the use of glmnet R package. I noticed that the coefficients I obtain from glmnet::glmnet function are different from those I get by computing coefficients by definition (with the use of the same lambda value). Could somebody explain me why? Data (both: response Y and design matrix X ) are scaled. library(MASS) library(glmnet) # Data dimensions p.tmp <- 100 n.tmp <- 100 # Data objects set.seed(1) X <- scale(mvrnorm(n.tmp, mu = rep(0, p.tmp), Sigma = diag(p

lm function in R does not give coefficients for all factor levels in categorical data

阅读更多关于 lm function in R does not give coefficients for all factor levels in categorical data

问题 I was trying out linear regression with R using categorical attributes and observe that I don't get a coefficient value for each of the different factor levels I have. Please see my code below, I have 5 factor levels for states, but see only 4 values of co-efficients. > states = c("WA","TE","GE","LA","SF") > population = c(0.5,0.2,0.6,0.7,0.9) > df = data.frame(states,population) > df states population 1 WA 0.5 2 TE 0.2 3 GE 0.6 4 LA 0.7 5 SF 0.9 > states=NULL > population=NULL > lm(formula

How `poly()` generates orthogonal polynomials? How to understand the “coefs” returned?

阅读更多关于 How `poly()` generates orthogonal polynomials? How to understand the “coefs” returned?

问题 My understanding of orthogonal polynomials is that they take the form y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired where a1 , a2 etc are coefficients to each orthogonal term (vary between fits), and c1 , c2 etc are coefficients within the orthogonal terms, determined such that the terms maintain orthogonality (consistent between fits using the same x values) I understand poly() is used to fit orthogonal polynomials. An example x

`lm` summary not display all factor levels

阅读更多关于 `lm` summary not display all factor levels

I am running a linear regression on a number of attributes including two categorical attributes, B and F , and I don't get a coefficient value for every factor level I have. B has 9 levels and F has 6 levels. When I initially ran the model (with intercepts), I got 8 coefficients for B and 5 for F which I understood as the first level of each being included in the intercept. I want ranking the levels within B and F based on their coefficient so I added -1 after each factor to lock the intercept at 0 so that I could get coefficients for all levels. Call: lm(formula = dependent ~ a + B-1 + c + d

Aligning Data frame with missing values

阅读更多关于 Aligning Data frame with missing values

问题 I'm using a data frame with many NA values. While I'm able to create a linear model, I am subsequently unable to line the fitted values of the model up with the original data due to the missing values and lack of indicator column. Here's a reproducible example: library(MASS) dat <- Aids2 # Add NA's dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA # Create a model model <- lm(death ~ diag + age, data = dat) # Different Values length(fitted.values(model)) # 2745 nrow(dat) # 2843 回答1:

Fast pairwise simple linear regression between variables in a data frame

阅读更多关于 Fast pairwise simple linear regression between variables in a data frame

I have seen pairwise or general paired simple linear regression many times on Stack Overflow. Here is a toy dataset for this kind of problem. set.seed(0) X <- matrix(runif(100), 100, 5, dimnames = list(1:100, LETTERS[1:5])) b <- c(1, 0.7, 1.3, 2.9, -2) dat <- X * b[col(X)] + matrix(rnorm(100 * 5, 0, 0.1), 100, 5) dat <- as.data.frame(dat) pairs(dat) So basically we want to compute 5 * 4 = 20 regression lines: ----- A ~ B A ~ C A ~ D A ~ E B ~ A ----- B ~ C B ~ D B ~ E C ~ A C ~ B ----- C ~ D C ~ E D ~ A D ~ B D ~ C ----- D ~ E E ~ A E ~ B E ~ C E ~ D ----- Here is a poor man's strategy: poor <

How (and why) do you use contrasts?

阅读更多关于 How (and why) do you use contrasts?

问题 Under what cases do you create contrasts in your analysis? How is it done and what is it used for? I checked ?contrasts and ?C - both lead to "Chapter 2 of Statistical Models in S", which is not readily available to me. 回答1: Contrasts are needed when you fit linear models with factors (i.e. categorical variables) as explanatory variables. The contrast specifies how the levels of the factors will be coded into a family of numeric dummy variables for fitting the model. Here are some good notes

gradient descent using python and numpy

阅读更多关于 gradient descent using python and numpy

def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) ) temp[0]=theta[0]-(alpha/m)*(np.sum(h-y)) temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1])) theta=temp return theta X_norm,mean,std=featureScale(X) #length of X (number of rows) m=len(X) X_norm=np.array([np.ones(m),X_norm]) n,m=np.shape(X_norm) num_it=1500 alpha=0.01 theta=np.zeros(n,float)[:,np.newaxis] X_norm=X_norm.transpose() theta=gradient(X_norm,y,theta,alpha,m,n,num_it)

Linear Regression with a known fixed intercept in R

阅读更多关于 Linear Regression with a known fixed intercept in R

问题 I want to calculate a linear regression using the lm() function in R. Additionally I want to get the slope of a regression, where I explicitly give the intercept to lm() . I found an example on the internet and I tried to read the R-help "?lm" (unfortunately I'm not able to understand it), but I did not succeed. Can anyone tell me where my mistake is? lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2)) plot (lin$x, lin$y) regImp = lm(formula = lin$x ~ lin$y) abline(regImp,