Predicting via Lowess in R (OR reconciling Loess & Lowess)

假装没事ソ 提交于 2019-12-08 04:11:18

问题


I'm trying to interpolate/locally extrapolate some salary data to fill out a data set.

Here's the data set and a plot of the available data:

    experience   salary
 1:          1 21878.67
 2:          2 23401.33
 3:          3 23705.00
 4:          4 24260.00
 5:          5 25758.60
 6:          6 26763.40
 7:          7 27920.00
 8:          8 28600.00
 9:          9 28820.00
10:         10 32600.00
11:         12 30650.00
12:         14 32600.00
13:         15 32600.00
14:         16 37700.00
15:         17 33380.00
16:         20 36784.33
17:         23 35600.00
18:         25 33590.00
19:         30 32600.00
20:         31 33920.00
21:         35 32600.00

Given the clear nonlinearity, I'm hoping to interpolate & extrapolate (I want to fill in experience for years 0 through 40) via a local linear estimator, so I defaulted to lowess, which gives this:

This is nice on the plot, but the raw data is missing -- R's plotting device has filled in the blanks for us. I haven't been able to find a predict method for this function, as it seems R is moving towards using loess, which as I understand is a generalization.

However, when I use loess (setting surface="direct" to be able to extrapolate, as mentioned in ?loess), which has a standard predict method, the fit is less satisfactory:

(There are strong theoretical reasons to say that salary should be non-decreasing--there is some noise/possible mis-measurement driving the U shape here)

And I can't seem to be able to fiddle around with any of the parameters to get back the non-decreasing fit given by lowess.

Any suggestions for what to do?


回答1:


I don't know how to "reconcile" those two functions but I have used the cobs package (COnstrained B-Splines Nonparametric Regression Quantiles ) with some success for similar tasks. The default quantile is the (local) median or 0.5 quantile. In this dataset the default choices for span or kernel width seem very appropriate.

require(cobs)
Loading required package: cobs
Package cobs (1.3-0) attached.  To cite, see citation("cobs")

 Rbs <- cobs(x=dat$experience,y=dat$salary, constraint= "increase")
qbsks2():
# Performing general knot selection ...
#
# Deleting unnecessary knots ...
 Rbs
#COBS regression spline (degree = 2) from call:
#    cobs(x = dat$experience, y = dat$salary, constraint = "increase")
#{tau=0.5}-quantile;  dimensionality of fit: 5 from {5}
#x$knots[1:4]:  0.999966,  5.000000, 15.000000, 35.000034
plot(Rbs, lwd = 2.5)

It does have a predict method although you will need to use idiosyncratic arguments since it doesn't support the usual data= formalism:

 help(predict.cobs)
 predict(Rbs, z=seq(0,40,by=5))
       z      fit
 [1,]  0 21519.83
 [2,]  5 25488.71
 [3,] 10 30653.44
 [4,] 15 32773.21
 [5,] 20 33295.84
 [6,] 25 33669.14
 [7,] 30 33893.12
 [8,] 35 33967.78
 [9,] 40 33893.12


来源:https://stackoverflow.com/questions/29264059/predicting-via-lowess-in-r-or-reconciling-loess-lowess

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!