predict() and newdata - How does this work?

大城市里の小女人 提交于 2019-12-25 07:59:31

问题


Someone recently posted a question on this paper here: https://static.googleusercontent.com/media/www.google.com/en//googleblogs/pdfs/google_predicting_the_present.pdf

The R code of the paper can be found at the very end of the paper. Essentially, the paper investigates one-month ahead predictions of sales through search queries. I think I understood the model and method, but there's one detail that puzzles me. It's the part:

1 ##### Divide data by two parts - model fitting & prediction
dat1 = mdat[1:(nrow(mdat)-1), ]
dat2 = mdat[nrow(mdat), ]

2 ##### Fit Model;
fit = lm(log(sales) ~ log(s1) + log(s12) + trends1, data=dat1);
summary(fit)

and:

3 #### Prediction for the next month;
predict.fit = predict(fit, newdata=dat2, se.fit=TRUE);

I do understand, that dat2 in (1) is only the last row from mdat. (2) means that the regression model is applied to everything but the last row in the dataset.

But why is newdata=dat2 in the prediction model of (3) being used and what does it mean? Why the last row only?


回答1:


Here is a description for each line of the code:

dat1 = mdat[1:(nrow(mdat)-1), ] 

Creates a subset of the whole dataset which contains all but the last row.

dat2 = mdat[nrow(mdat), ]

Creates a subset of the whole dataset which contains only the last row.

fit = lm(log(sales) ~ log(s1) + log(s12) + trends1, data=dat1)

For the model fitting is only the first subset dat1 used. So the data without the last row.

predict.fit = predict(fit, newdata=dat2, se.fit=TRUE)

predict takes the fitted model and looks what it would predict for the "unseen" data dat2.

In the easiest case with only one independent variable we would fit a line to dat1 and then look which Y-value would be predicted for the X-value of dat2.



来源:https://stackoverflow.com/questions/38036874/predict-and-newdata-how-does-this-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!