linear-regression

Why do I get NA coefficients and how does `lm` drop reference level for interaction

感情迁移 提交于 2019-12-01 05:26:40
I am trying to understand how R determines reference groups for interactions in a linear model. Consider the following: df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), year = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("1", "2"), class = "factor"), treatment = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

Why does the number of rows change during AIC in R? How to ensure that this doesn't happen?

那年仲夏 提交于 2019-12-01 04:59:42
I'm trying to find a minimal adequate model using AIC in R. I keep getting the following error: Error in step(model) : number of rows in use has changed: remove missing values? My data: data<-structure(list(ID = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 27L, 28L, 29L, 30L, 31L, 33L, 34L, 35L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L,

How does the subset argument work in the lm() function?

社会主义新天地 提交于 2019-12-01 04:18:57
I have been trying to figure out how the subset argument in R's lm() function works. Especially the follwoing code seems dubious for me: data(mtcars) summary(lm(mpg ~ wt, data=mtcars)) summary(lm(mpg ~ wt, cyl, data=mtcars)) In every case the regression has 32 observations dim(lm(mpg ~ wt, cyl ,data=mtcars)$model) [1] 32 2 dim(lm(mpg ~ wt ,data=mtcars)$model) [1] 32 2 yet the coefficients change (along with the R²). The help doesn't provide too much information on this matter: subset an optional vector specifying a subset of observations to be used in the fitting process As a general principle

Plot conditional density curve `P(Y|X)` along a linear regression line

南楼画角 提交于 2019-12-01 03:53:34
This is my data frame, with two columns Y (response) and X (covariate): ## Editor edit: use `dat` not `data` dat <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715, 1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692, 0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797, 1.27, 2.515, -0.765, 0.261, 0.423, 1.698, -2.734, 0.743, -2.39, 0.365, 2.981, -1.185, -0.57, 2.638, -1.046, 1.931, 4.583, -1.276, 1.075, 2.893, -1.602, 1.801, 2.405, -5.236, 2.214, 1.295, 1.438, -0.638, 0.716, 1.004, -1.328, -1.759, -1.315, 1.053, 1.958, -2.034, 2.936, -0.078, -0.676, -2

linear regression using lm() - surprised by the result

半城伤御伤魂 提交于 2019-12-01 03:47:16
I used a linear regression on data I have, using the lm function. Everything works (no error message), but I'm somehow surprised by the result: I am under the impression R "misses" a group of points, i.e. the intercept and slope are not the best fit. For instance, I am referring to the group of points at coordinates x=15-25,y=0-20. My questions: is there a function to compare fit with "expected" coefficients and "lm-calculated" coefficients? have I made a silly mistake when coding, leading the lm to do that? Following some answers: additionnal information on x and y x and y are both visual

Moving window regression

核能气质少年 提交于 2019-12-01 01:54:42
I want to perform a moving window regression on every pixel of two raster stacks representing Band3 and Band4 of Landsat data. The result should be two additional stacks, one representing the Intercept and the other one representing the slope of the regression. So layer 1 of stack "B3" and stack "B4" result in layer 1 of stack "intercept" and stack "slope". Layer 2 of stack B3 and stack B4 result in layer 2,.... and so on. I already came along the gwr function, but want to stay in the raster package. I somehow know that focal must be included in order to set my moving window (which should be

How does the subset argument work in the lm() function?

好久不见. 提交于 2019-12-01 01:36:47
问题 I have been trying to figure out how the subset argument in R's lm() function works. Especially the follwoing code seems dubious for me: data(mtcars) summary(lm(mpg ~ wt, data=mtcars)) summary(lm(mpg ~ wt, cyl, data=mtcars)) In every case the regression has 32 observations dim(lm(mpg ~ wt, cyl ,data=mtcars)$model) [1] 32 2 dim(lm(mpg ~ wt ,data=mtcars)$model) [1] 32 2 yet the coefficients change (along with the R²). The help doesn't provide too much information on this matter: subset an

Plot conditional density curve `P(Y|X)` along a linear regression line

对着背影说爱祢 提交于 2019-12-01 01:30:55
问题 This is my data frame, with two columns Y (response) and X (covariate): ## Editor edit: use `dat` not `data` dat <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715, 1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692, 0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797, 1.27, 2.515, -0.765, 0.261, 0.423, 1.698, -2.734, 0.743, -2.39, 0.365, 2.981, -1.185, -0.57, 2.638, -1.046, 1.931, 4.583, -1.276, 1.075, 2.893, -1.602, 1.801, 2.405, -5.236, 2.214, 1

How to extract a particular value from the OLS-summary in Pandas?

自闭症网瘾萝莉.ら 提交于 2019-12-01 00:55:12
问题 is it possible to get other values (currently I know only a way to get beta and intercept) from the summary of linear regression in pandas? I need to get R-squared. Here is an extraction from manual: In [244]: model = ols(y=rets['AAPL'], x=rets.ix[:, ['GOOG']]) In [245]: model Out[245]: -------------------------Summary of Regression Analysis--------------------- ---- Formula: Y ~ <GOOG> + <intercept> Number of Observations: 756 Number of Degrees of Freedom: 2 R-squared: 0.2814 Adj R-squared:

Why do different methods for solving Xc=y in python give different solution when they should not?

怎甘沉沦 提交于 2019-12-01 00:52:14
I was trying to solve a linear system Xc=y that was square. The methods I know to solve this are: using inverse c=<X^-1,y> using Gaussian elimination using the pseudo-inverse It seems as far as I can tell that these don't match what I thought would be the ground truth. First generate the truth parameters by fitting a polynomial of degree 30 to a cosine with frequency 5. So I have y_truth = X*c_truth . Then I check if the above three methods match the truth I tried it but the methods don't seem to match and I don't see why that should be the case. I produced fully runnable reproducible code: