linear-regression | 易学教程

Is there a better alternative than string manipulation to programmatically build formulas?

阅读更多关于 Is there a better alternative than string manipulation to programmatically build formulas?

问题 Everyone else\'s functions seem to take formula objects and then do dark magic to them somewhere deep inside and I\'m jealous. I\'m writing a function that fits multiple models. Parts of the formulas for these models remain the same and part change from one model to the next. The clumsy way would be to have the user input the formula parts as character strings, do some character manipulation on them, and then use as.formula . But before I go that route, I just want to make sure that I\'m not

`lm` summary not display all factor levels

阅读更多关于 `lm` summary not display all factor levels

问题 I am running a linear regression on a number of attributes including two categorical attributes, B and F , and I don\'t get a coefficient value for every factor level I have. B has 9 levels and F has 6 levels. When I initially ran the model (with intercepts), I got 8 coefficients for B and 5 for F which I understood as the first level of each being included in the intercept. I want ranking the levels within B and F based on their coefficient so I added -1 after each factor to lock the

Fast pairwise simple linear regression between variables in a data frame

阅读更多关于 Fast pairwise simple linear regression between variables in a data frame

问题 I have seen pairwise or general paired simple linear regression many times on Stack Overflow. Here is a toy dataset for this kind of problem. set.seed(0) X <- matrix(runif(100), 100, 5, dimnames = list(1:100, LETTERS[1:5])) b <- c(1, 0.7, 1.3, 2.9, -2) dat <- X * b[col(X)] + matrix(rnorm(100 * 5, 0, 0.1), 100, 5) dat <- as.data.frame(dat) pairs(dat) So basically we want to compute 5 * 4 = 20 regression lines: ----- A ~ B A ~ C A ~ D A ~ E B ~ A ----- B ~ C B ~ D B ~ E C ~ A C ~ B ----- C ~ D

gradient descent using python and numpy

阅读更多关于 gradient descent using python and numpy

问题 def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) ) temp[0]=theta[0]-(alpha/m)*(np.sum(h-y)) temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1])) theta=temp return theta X_norm,mean,std=featureScale(X) #length of X (number of rows) m=len(X) X_norm=np.array([np.ones(m),X_norm]) n,m=np.shape(X_norm) num_it=1500 alpha=0.01 theta=np

How to force R to use a specified factor level as reference in a regression?

阅读更多关于 How to force R to use a specified factor level as reference in a regression?

问题 How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression? It\'s just using some level by default. lm(x ~ y + as.factor(b)) with b {0, 1, 2, 3, 4} . Let\'s say I want to use 3 instead of the zero that is used by R. 回答1: See the relevel() function. Here is an example: set.seed(123) x <- rnorm(100) DF <- data.frame(x = x, y = 4 + (1.5*x) + rnorm(100, sd = 2), b = gl(5, 20)) head(DF) str(DF) m1 <- lm(y ~ x + b, data = DF) summary(m1) Now alter the

How does predict.lm() compute confidence interval and prediction interval?

阅读更多关于 How does predict.lm() compute confidence interval and prediction interval?

问题 I ran a regression: CopierDataRegression <- lm(V1~V2, data=CopierData1) and my task was to obtain a 90% confidence interval for the mean response given V2=6 and 90% prediction interval when V2=6 . I used the following code: X6 <- data.frame(V2=6) predict(CopierDataRegression, X6, se.fit=TRUE, interval=\"confidence\", level=0.90) predict(CopierDataRegression, X6, se.fit=TRUE, interval=\"prediction\", level=0.90) and I got (87.3, 91.9) and (74.5, 104.8) which seems to be correct since the PI

How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting

阅读更多关于 How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting

问题 I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic). I use Python and Numpy and for polynomial fitting there is a function polyfit() . But I found no such functions for exponential and logarithmic fitting. Are there any? Or how to solve it otherwise? 回答1: For fitting y = A + B log x , just fit y against (log x ). >>> x = numpy.array([1, 7, 20, 50, 79]) >>> y = numpy.array([10, 19, 30, 35, 51]) >>> numpy.polyfit

Fitting a linear model with multiple LHS

阅读更多关于 Fitting a linear model with multiple LHS

问题 I am new to R and I want to improve the following script with an *apply function (I have read about apply , but I couldn\'t manage to use it). I want to use lm function on multiple independent variables (which are columns in a data frame). I used for (i in (1:3) { assign(paste0(\'lm.\',names(data[i])), lm(formula=formula(i),data=data)) } Formula(i) is defined as formula=function(x) { as.formula ( paste(names(data[x]),\'~\', paste0(names(data[-1:-3]), collapse = \'+\')), env=parent.frame() ) }

How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting

阅读更多关于 How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting

I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic). I use Python and Numpy and for polynomial fitting there is a function polyfit() . But I found no such functions for exponential and logarithmic fitting. Are there any? Or how to solve it otherwise? For fitting y = A + B log x , just fit y against (log x ). >>> x = numpy.array([1, 7, 20, 50, 79]) >>> y = numpy.array([10, 19, 30, 35, 51]) >>> numpy.polyfit(numpy.log(x), y, 1) array([ 8.46295607, 6.61867463]) # y ≈ 8.46 log(x) + 6.62 For fitting y = Ae Bx , take

Linear Regression and group by in R

阅读更多关于 Linear Regression and group by in R

问题 I want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit a regression for each state so that at the end I have a vector of lm responses. I can imagine doing for loop for each state then doing the regression inside the loop and adding the results of each regression to a vector. That does not seem very R-like, however. In SAS I would do a \'by\' statement and in SQL I would