linear-regression

Is there a better alternative than string manipulation to programmatically build formulas?

我是研究僧i 提交于 2019-11-26 07:57:13
问题 Everyone else\'s functions seem to take formula objects and then do dark magic to them somewhere deep inside and I\'m jealous. I\'m writing a function that fits multiple models. Parts of the formulas for these models remain the same and part change from one model to the next. The clumsy way would be to have the user input the formula parts as character strings, do some character manipulation on them, and then use as.formula . But before I go that route, I just want to make sure that I\'m not

`lm` summary not display all factor levels

痞子三分冷 提交于 2019-11-26 05:34:33
问题 I am running a linear regression on a number of attributes including two categorical attributes, B and F , and I don\'t get a coefficient value for every factor level I have. B has 9 levels and F has 6 levels. When I initially ran the model (with intercepts), I got 8 coefficients for B and 5 for F which I understood as the first level of each being included in the intercept. I want ranking the levels within B and F based on their coefficient so I added -1 after each factor to lock the

Fast pairwise simple linear regression between variables in a data frame

醉酒当歌 提交于 2019-11-26 04:53:46
问题 I have seen pairwise or general paired simple linear regression many times on Stack Overflow. Here is a toy dataset for this kind of problem. set.seed(0) X <- matrix(runif(100), 100, 5, dimnames = list(1:100, LETTERS[1:5])) b <- c(1, 0.7, 1.3, 2.9, -2) dat <- X * b[col(X)] + matrix(rnorm(100 * 5, 0, 0.1), 100, 5) dat <- as.data.frame(dat) pairs(dat) So basically we want to compute 5 * 4 = 20 regression lines: ----- A ~ B A ~ C A ~ D A ~ E B ~ A ----- B ~ C B ~ D B ~ E C ~ A C ~ B ----- C ~ D

gradient descent using python and numpy

守給你的承諾、 提交于 2019-11-26 04:05:47
问题 def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) ) temp[0]=theta[0]-(alpha/m)*(np.sum(h-y)) temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1])) theta=temp return theta X_norm,mean,std=featureScale(X) #length of X (number of rows) m=len(X) X_norm=np.array([np.ones(m),X_norm]) n,m=np.shape(X_norm) num_it=1500 alpha=0.01 theta=np

How to force R to use a specified factor level as reference in a regression?

眉间皱痕 提交于 2019-11-26 03:04:07
问题 How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression? It\'s just using some level by default. lm(x ~ y + as.factor(b)) with b {0, 1, 2, 3, 4} . Let\'s say I want to use 3 instead of the zero that is used by R. 回答1: See the relevel() function. Here is an example: set.seed(123) x <- rnorm(100) DF <- data.frame(x = x, y = 4 + (1.5*x) + rnorm(100, sd = 2), b = gl(5, 20)) head(DF) str(DF) m1 <- lm(y ~ x + b, data = DF) summary(m1) Now alter the

How does predict.lm() compute confidence interval and prediction interval?

主宰稳场 提交于 2019-11-26 02:21:52
问题 I ran a regression: CopierDataRegression <- lm(V1~V2, data=CopierData1) and my task was to obtain a 90% confidence interval for the mean response given V2=6 and 90% prediction interval when V2=6 . I used the following code: X6 <- data.frame(V2=6) predict(CopierDataRegression, X6, se.fit=TRUE, interval=\"confidence\", level=0.90) predict(CopierDataRegression, X6, se.fit=TRUE, interval=\"prediction\", level=0.90) and I got (87.3, 91.9) and (74.5, 104.8) which seems to be correct since the PI

How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting

∥☆過路亽.° 提交于 2019-11-26 01:46:17
问题 I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic). I use Python and Numpy and for polynomial fitting there is a function polyfit() . But I found no such functions for exponential and logarithmic fitting. Are there any? Or how to solve it otherwise? 回答1: For fitting y = A + B log x , just fit y against (log x ). >>> x = numpy.array([1, 7, 20, 50, 79]) >>> y = numpy.array([10, 19, 30, 35, 51]) >>> numpy.polyfit

Fitting a linear model with multiple LHS

送分小仙女□ 提交于 2019-11-25 23:50:04
问题 I am new to R and I want to improve the following script with an *apply function (I have read about apply , but I couldn\'t manage to use it). I want to use lm function on multiple independent variables (which are columns in a data frame). I used for (i in (1:3) { assign(paste0(\'lm.\',names(data[i])), lm(formula=formula(i),data=data)) } Formula(i) is defined as formula=function(x) { as.formula ( paste(names(data[x]),\'~\', paste0(names(data[-1:-3]), collapse = \'+\')), env=parent.frame() ) }

How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting

空扰寡人 提交于 2019-11-25 23:48:34
I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic). I use Python and Numpy and for polynomial fitting there is a function polyfit() . But I found no such functions for exponential and logarithmic fitting. Are there any? Or how to solve it otherwise? For fitting y = A + B log x , just fit y against (log x ). >>> x = numpy.array([1, 7, 20, 50, 79]) >>> y = numpy.array([10, 19, 30, 35, 51]) >>> numpy.polyfit(numpy.log(x), y, 1) array([ 8.46295607, 6.61867463]) # y ≈ 8.46 log(x) + 6.62 For fitting y = Ae Bx , take

Linear Regression and group by in R

冷暖自知 提交于 2019-11-25 22:32:24
问题 I want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit a regression for each state so that at the end I have a vector of lm responses. I can imagine doing for loop for each state then doing the regression inside the loop and adding the results of each regression to a vector. That does not seem very R-like, however. In SAS I would do a \'by\' statement and in SQL I would