Bad version or endian-key in MATLAB parfor?

a 夏天 提交于 2020-01-11 04:43:06

问题


I am doing parallel computations with MATALB parfor. The code structure looks pretty much like

%%% assess fitness %%%
% save communication overheads
bitmaps = pop(1, new_indi_idices);
porosities = pop(2, new_indi_idices);
mid_fitnesses = zeros(1, numel(new_indi_idices));
right_fitnesses = zeros(1, numel(new_indi_idices));
% parallelization starts
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
    bitmap = bitmaps{idx};
    if porosities{idx}>POROSITY_MIN && porosities{idx}<POROSITY_MAX
        [mid_dsp, right_dsp] = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
        mid_fitness = 100+mid_dsp;
        right_fitness = 100+right_dsp;
    else % porosity not even qualified
        mid_fitness = 0;
        right_fitness = 0;
    end
    mid_fitnesses(idx) = mid_fitness;
    right_fitnesses(idx) = right_fitness;
    fprintf('Done.\n');
    pause(0.01); % for break
end

I encountered the following weird error.

Error using parallel.internal.pool.deserialize (line 9)
Bad version or endian-key

Error in distcomp.remoteparfor/getCompleteIntervals (line 141)
                        origErr =
                        parallel.internal.pool.deserialize(intervalError);

Error in nsga2 (line 57)
    parfor idx = 1:numel(new_indi_idices) % only assess the necessary

How should I fix it? A quick Google search returns no solution.

Update 1

The weirder thing is the following snippet works perfectly under the exactly same settings and the same HPC. I think there might be some subtle differences between them two, causing one to work and the other to fail. The working snippet:

%%% assess fitness %%%
% save communication overheads
bitmaps = pop(1, new_indi_idices);
porosities = pop(2, new_indi_idices);
fitnesses = zeros(1, numel(new_indi_idices));
% parallelization starts
parfor idx = 1:numel(new_indi_idices) % only assess the necessary
    bitmap = bitmaps{idx};
    if porosities{idx}>POROSITY_MIN && porosities{idx}<POROSITY_MAX
        displacement = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]);
        fitness = 100+displacement;
    else % porosity not even qualified
        fitness = 0;
    end
    fitnesses(idx) = fitness;
    %fprintf('Done.\n', gen, idx);
    pause(0.01); % for break
end
pop(3, new_indi_idices) = num2cell(fitnesses);

Update 2

Suspecting [mid_dsp, right_dsp] = compute_displacement(bitmap, ['1/' num2str(PIX_NO_PER_SIDE)]); causes me trouble, I replace it with

mid_dsp = rand();
right_dsp = rand();

Then, it works! This proves that this is indeed caused by this particular line. However, I do have tested the function, and it returns two numbers correctly! Since the function returns value just as rand() does, I can't see any difference. This confuses me more.


回答1:


I had the same issue and it came out that Matlab 2015 is reserving all necessary memory resources for each of the loops in the parfor resulting in memory break shortage. The error message is tricky. After fine tuning the code in the loop and providing 120GB of RAM from the SSD through system setting in Pagefile in Windows 10, the parfor executed beautifully.




回答2:


After working a while on my own similar code block, I've decided that this is actually a memory issue.

I'm using a 6 core 4GHz CPU and 8 gigs of RAM and seen this issue (on MATLAB 2014b) when I set the worker count high, and did not have any problems with low worker counts.

When I use 6 or more workers (which is not ideal I know), memory consumption is high and this error message pops out sporadically. Also I have seen various out of memory errors in my tests.

I havent seen the error when I use 5 or less workers thus far, and I'm pretty sure some memory limit (possibly inside a java code block) is causing this issue by preventing some of the results' integrity (or existance)

Hope you can resolve this issue by reducing the worker count.



来源:https://stackoverflow.com/questions/24592602/bad-version-or-endian-key-in-matlab-parfor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!