Python SciPy UnivariateSpline vs R smooth.spline

拟墨画扇 提交于 2020-01-03 04:14:04

问题


I am porting a script written in R over to Python. In R I am using smooth.spline and in Python I am using SciPy UnivariateSpline. They don't produce the same results (even though they are both based on a cubic spline method). Is there a way, or an alternative to UnivariateSpline, to make the Python spline return the same spline as R?

I'm a mathematician. I understand the general idea of splines. But not the fine details of their implementation in Python or R.

Here is the code in R and then Python. The input data is the same for both.

Here is the input data:

x =  0.0,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1.0
y =   -1,    1,    1,   -1,    1,    0,   .5,   .5,   .4,   .5,   -1

Here is the R code

x = seq(0,1, by = .1); 
y = c(-1,1,1, -1,1,0, .5,.5,.4,  .5, -1);
spline_xy = smooth.spline(x,y)
predict(spline_xy,x)

which outputs:

$x
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

$y
 [1]  0.120614583  0.170800975  0.210954680  0.238032338  0.253672155
 [6]  0.253684815  0.236432643  0.200264536  0.145403302  0.074993797
[11] -0.004853825

Here is the Python Code

import numpy as np
from scipy.interpolate import UnivariateSpline
x = np.linspace(0, 1, num = 11, endpoint=True)    
y = np.array([-1,1,1, -1,1,0, .5,.5,.4,  .5, -1]) 
spline_xy = UnivariateSpline(x,y)
print('x =', x)
print('ysplined =',spline_xy(x))

which outputs:

x = [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]

ysplined = 
[-0.26433566 -0.02587413  0.18857809 0.36585082  0.49277389  
  0.55617716 0.54289044  0.43974359  0.23356643 -0.08881119 
 -0.54055944]

I hoped the outputs, in R $y and in Python ysplined would be identical. But they aren't.

Any help, for example how to set the parameters, or explanations would be appreciated! Thank you in advance.


回答1:


Those appear to me to be different smoothing methods.

smooth.spline in R is a "smoothing spline", which is an overparametrized natural spline (knots at every data point, cubic spline in the interior, linear extrapolation), with penalized least squares used to choose the parameters. You can read the help page for the details of how the penalty is computed.

On the other hand, Python's UnivariateSpline appears from the documentation here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html to be a regression spline, fit by least squares with no penalty. It appears to adaptively choose the number of knots.

These are completely different algorithms, and I wouldn't expect them to give equal results. I don't know if there's an R package that uses the same adaptive choice of knots as Python does. This answer: https://stackoverflow.com/a/55481248/2554330 claims to reference a natural smoothing spline implementation in Python, but I don't know if it matches R's implementation.




回答2:


You can use R functions in Python with rpy2:

import numpy as np
import rpy2.robjects as robjects
x = np.linspace(0, 1, num = 11, endpoint=True)    
y = np.array([-1,1,1, -1,1,0, .5,.5,.4,  .5, -1])

r_x = robjects.FloatVector(x)
r_y = robjects.FloatVector(y)
r_smooth_spline = robjects.r['smooth.spline'] #extract R function
spline_xy = r_smooth_spline(x=r_x, y=r_y)
print('x =', x)
print('ysplined =',np.array(robjects.r['predict'](spline_xy,robjects.FloatVector(x)).rx2('y')))

which outputs:

x = [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
ysplined = [ 0.12061458  0.17080098  0.21095468  0.23803234  0.25367215  0.25368481
0.23643264  0.20026454  0.1454033   0.0749938  -0.00485382]

exactly like you wished it.

If you want to directly set lambda: spline_xy = r_smooth_spline(x=r_x, y=r_y, lambda=42) doesn't work, because lambda has already another meaning in Python, but there is a solution: How to use the lambda argument of smooth.spline in RPy WITHOUT Python interprating it as lambda.



来源:https://stackoverflow.com/questions/56667657/python-scipy-univariatespline-vs-r-smooth-spline

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!