问题
I am using R to evaluate climate data and I have a data set that looks like the following miniaturized version... please forgive my crude posting etiquette, I hope this post is understandable.
[0][STA.NAME] [YEAR] [SUM.CDD]
1 NAME1 1967 760
2 NAME1 1985 800
3 NAME1 1996 740
4 NAME1 2003 810
5 NAME1 2011 790
6 NAME2 1967 700
7 NAME2 1985 690
8 NAME2 1996 850
9 NAME2 2003 790
10 NAME3 1967 760
11 NAME3 1985 800
12 NAME3 1990 740
13 NAME3 1996 810
14 NAME3 2003 790
15 NAME3 2011 800
I am trying to return a new DF with this
[STA.NAME] [Eq'n of trend]
NAME1 (y = mx + b)
NAME2 (y = mx + b)
etc...
Eventually I will need to calculate variance of the trends, as well as total variance of data and would like to eventually append those to this resulting data set for something like...
[STA.NAME] [TREND] [VAR.TREND] [VAR.DATA]
with values in rows, 1 for each STA.NAME...
Any help is greatly appreciated, If there is a better way than lm(), with which I am currently stumped, I would be interested as well.
Thank you very much,
Jesse
回答1:
Here is a simple solution using ddply()
from plyr
to return the coefficients for each group:
First replicate the data:
x <- read.table(text="
STA.NAME YEAR SUM.CDD
1 NAME1 1967 760
2 NAME1 1985 800
3 NAME1 1996 740
4 NAME1 2003 810
5 NAME1 2011 790
6 NAME2 1967 700
7 NAME2 1985 690
8 NAME2 1996 850
9 NAME2 2003 790
10 NAME3 1967 760
11 NAME3 1985 800
12 NAME3 1990 740
13 NAME3 1996 810
14 NAME3 2003 790
15 NAME3 2011 800 ", header=TRUE)
Now do the modelling:
library(plyr)
ddply(x, .(STA.NAME), function(z)coef(lm(SUM.CDD ~ YEAR, data=z)))
STA.NAME (Intercept) YEAR
1 NAME1 -444.8361 0.6147541
2 NAME2 -6339.2047 3.5702200
3 NAME3 -995.2381 0.8928571
Now, depending on what you want to do, it may be simpler (and perhaps more meaningful) to create a single model of your data:
fit <- lm(SUM.CDD ~ YEAR + STA.NAME, data=x)
Get a summary:
summary(fit)
Call:
lm(formula = SUM.CDD ~ YEAR + STA.NAME, data = x)
Residuals:
Min 1Q Median 3Q Max
-63.57 -22.21 10.72 18.62 80.72
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2065.6401 1463.5353 -1.411 0.1858
YEAR 1.4282 0.7345 1.945 0.0778 .
STA.NAMENAME2 -15.8586 27.5835 -0.575 0.5769
STA.NAMENAME3 3.9046 24.7089 0.158 0.8773
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 40.8 on 11 degrees of freedom
Multiple R-squared: 0.3056, Adjusted R-squared: 0.1162
F-statistic: 1.614 on 3 and 11 DF, p-value: 0.2424
Extract only the coefficients:
coef(fit)
(Intercept) YEAR STA.NAMENAME2 STA.NAMENAME3
-2065.640078 1.428247 -15.858650 3.904632
Finally, you perhaps wanted to fit a model with interaction terms. This model gives you effectively the same results as the original plyr
solution. Depending on your data and your objectives, this might be the way to do it:
fit <- lm(SUM.CDD ~ YEAR * STA.NAME, data=x)
summary(fit)
Call:
lm(formula = SUM.CDD ~ YEAR * STA.NAME, data = x)
Residuals:
Min 1Q Median 3Q Max
-57.682 -13.166 -1.012 23.006 63.046
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -444.8361 2280.7464 -0.195 0.850
YEAR 0.6148 1.1447 0.537 0.604
STA.NAMENAME2 -5894.3687 3661.9795 -1.610 0.142
STA.NAMENAME3 -550.4020 3221.8390 -0.171 0.868
YEAR:STA.NAMENAME2 2.9555 1.8406 1.606 0.143
YEAR:STA.NAMENAME3 0.2781 1.6172 0.172 0.867
Residual standard error: 39.17 on 9 degrees of freedom
Multiple R-squared: 0.4763, Adjusted R-squared: 0.1854
F-statistic: 1.637 on 5 and 9 DF, p-value: 0.2451
来源:https://stackoverflow.com/questions/14658961/linear-regression-calculation-several-times-in-one-dataframe