问题
I am using the mgcv
package to model the ozone pollution concentration according to some environmental covariates. The model takes the form :
model1 <- gam(O3 ~ s(X, Y, bs = "tp", k = 10) + wd + s(date, bs = "cc", k = 100) + district,
data = mydata, family = gaussian(link ="log"),
na.action = "na.omit", method = "REML")
And here is the structure of covariates:
> str(mydata)
'data.frame': 7100 obs. of 286 variables:
$ date : Date, format: "2016-01-01" "2016-01-01" "2016-01-01" ...
$ O3 : num 0.0141 0.0149 0.0102 0.0159 0.0186 ...
$ district : Factor w/ 10 levels "bc","bh","dl",..: 1 8 7 8 2 6 4 4 10 2 ...
$ wd : Factor w/ 16 levels "E","ENE","ESE",..: 13 13 13 13 13 2 9 9 11 13 ...
$ X : num 0.389 0.365 1 0.44 0.892 ...
$ Y : num 0.311 0.204 0.426 0.223 0.162 ...
I am stuck on an
error in R: 'names' attribute [1] must be the same length as the vector [0].
I try to find where the problem is by delete the term of s(date, bs = "cc", k = 100)
from the fomular and it could work well. It seems like there is something wrong with date field.
I'm not exactly sure how to fix this problem. Any advice would be greatly appreciated!
回答1:
The date
variable won't be automatically converted to a numeric variable; you need to do this yourself. I normally process such information as follows
mydata <- transform(mydata, ndate = as.numeric(date),
nyear = as.numeric(format(date, '%Y')),
nmonth = as.numeric(format(date, '%m')),
doy = as.numeric(format(date, '%j')))
Then I can choose to model the time component in a number of ways:
- trend based on
ndate
ofnyear
with a non-cyclic spline, or - cyclic pattern based on
nmonth
ordoy
(for day of year), or - a combination of trend and cyclic pattern
It's unclear from your question if your data are restricted to a single year. If the data span multiple years then you can't just use the cyclic spline on the ndate
variable. You will need either a very complex standard spline (option 1) or include two splines, one for the between year part and one for the within year part (option 3.)
If your data is over multiple years then I would set the model up as
O3 ~ s(X, Y, bs = "tp", k = 10) + wd + s(doy, bs = 'cc', k = 20) +
s(ndate, bs = "tp", k = 50) + district
or perhaps s(nyear, .... )
will be sufficient instead of s(ndate, .... )
.
This kind of decomposition of the time component is useful as you can often do a better job of fitting the series via two simple, well-estimated smooths than a single more complex smooth. It also allows you to test for within and between year effects.
If you need the seasonal cycle to vary with the trend, then a tensor product is helpful:
O3 ~ s(X, Y, bs = "tp", k = 10) + wd +
te(doy, ndate, bs = c('cc','tp'), k = c(20,50)) + district
For cyclic splines you may also want to set the knots
argument, especially if your data don't quite span the full range of days of year etc. For doy
I would use knots = list(doy = c(0.5, 366.5))
as this allows Dec 31st and Jan 1st to have slightly different estimated values. For nmonth
this is more important as otherwise Dec and Jan would get the same fitted value. I use: knots = list(nmonth = c(0.5, 12.5))
.
The idea here is that 1
and 12
reflect the middle of the respective month and 0.5
and 12.5
the beginning and end of the first and last months, which we might expect to be the same.
来源:https://stackoverflow.com/questions/54033437/error-in-gam-function-in-namesx-value-names-attribute-must-be-the-same-l