Parallelization of Piecewise Polynomial Evaluation

问题

I am trying to evaluate points in a large piecewise polynomial, which is obtained from a cubic-spline. This takes a long time to do and I would like to speed it up.

As such, I would like to evaluate a points on a piecewise polynomial with parallel processes, rather than sequentially.

Code:

z = zeros(1e6, 1) ;    % preallocate some memory for speed
Y = rand(11220,161) ;  %some data, rand for generating a working example
X = 0 : 0.0125 : 2 ;   % vector of data sites
pp = spline(X, Y) ;    % get the piecewise polynomial form of the cubic spline.

The resulting structure is large.

for t = 1 : 1e6  % big number
    hcurrent = ppval(pp,t); %evaluate the piecewise polynomial at t
    z(t) = sum(x(t:t+M-1).*hcurrent,1) ; % do some operation of the interpolated value. Most likely not relevant to this question.
end

Unfortunately, with matrix form and using:

hcurrent = flipud(ppval(pp, 1: 1e6 ))

requires too much memory to process, so cannot be done. Is there a way that I can batch process this code to speed it up?

回答1:

For scalar second arguments, as in your example, you're dealing with two issues. First, there's a good amount of function call overhead and redundant computation (e.g., unmkpp(pp) is called every loop iteration). Second, ppval is written to be general so it's not fully vectorized and does a lot of things that aren't necessary in your case.

Below is vectorized code code that take advantage of some of the structure of your problem (e.g., t is an integer greater than 0), avoids function call overhead, move some calculations outside of your main for loop (at the cost of a bit of extra memory), and gets rid of a for loop inside of ppval:

n = 1e6;
z = zeros(n,1);
X = 0:0.0125:2;
Y = rand(11220,numel(X));
pp = spline(X,Y);

[b,c,l,k,dd] = unmkpp(pp);
T = 1:n;
idx = discretize(T,[-Inf b(2:l) Inf]); % Or: [~,idx] = histc(T,[-Inf b(2:l) Inf]);
x = bsxfun(@power,T-b(idx),(k-1:-1:0).').';

idx = dd*idx;
d = 1-dd:0;
for t = T
    hcurrent = sum(bsxfun(@times,c(idx(t)+d,:),x(t,:)),2);
    z(t) = ...;
end

The resultant code takes ~34% of the time of your example for n=1e6. Note that because of the vectorization, calculations are performed in a different order. This will result in slight differences between outputs from ppval and my optimized version due to the nature of floating point math. Any differences should be on the order of a few times eps(hcurrent). You can still try using parfor to further speed up the calculation (with four already running workers, my system took just 12% of your code's original time).

I consider the above a proof of concept. I may have over-optmized the code above if your example doesn't correspond well to your actual code and data. In that case, I suggest creating your own optimized version. You can start by looking at the code for ppval by typing edit ppval in your Command Window. You may be able to implement further optimizations by looking at the structure of your problem and what you ultimately want in your z vector.

Internally, ppval still uses histc, which has been deprecated. My code above uses discretize to perform the same task, as suggested by the documentation.

回答2:

Use parfor command for parallel loops. see here, also precompute z vector as z(j) = x(j:j+M-1) and hcurrent in parfor for speed up.

回答3:

The Spline Parameters estimation can be written in Matrix form.

Once you write it in Matrix form and solve it you can use the Model Matrix to evaluate the Spline on all data point using Matrix Multiplication which is probably the most tuned operation in MATLAB.

来源：https://stackoverflow.com/questions/42498490/parallelization-of-piecewise-polynomial-evaluation

标签

multithreading

matlab

parallel-processing

vectorization

spline