Why does writing to an unrelated file cause the load function to be so slow?

為{幸葍}努か 提交于 2019-12-22 08:27:51

问题


I've just spent a while debugging some particularly slow code and have been completely thrown off by the MATLAB profiler. This looks to me like a massive bug, so I was wondering if anyone could cast any light on to what is going on here.

Here is some code that will cause the problem:

function profiler_test

  %%% Create 20 files with random data

  count = 20;

  for i = 1 : count
    x = rand(3);
    save(sprintf('temp_file_%06d', i), 'x'); 
  end

  %%% Load them in a for loop

  xs = cell(1, count);

  tic;
  for i = 1 : count
    x = load(sprintf('temp_file_%06d', i), 'x');
    xs{i} = x.x;
  end
  toc

  %%% Load them in a for loop, but writing a small log file on the way

  tic;
  for i = 1 : count
    x = load(sprintf('temp_file_%06d', i), 'x');
    xs{i} = x.x;

    file = fopen(sprintf('temp_logfile_%d', i), 'w');
    fprintf(file, 'Success\n');
    fclose(file);
  end
  toc


end

The first for loop takes 0.239739 seconds, the second takes 4.411179.

Now, I should make it clear that I am aware of the sloppy idea I had, shown in the second for loop example, of creating a log file for each result - it was because I was running on a cluster where I couldn't see the output, I wanted a cheap indication of the function's progress, and this turned out to be the bottleneck. I'm fine with that.

My problem however is that I've spent a day trying to optimise the wrong line, because the MATLAB profiler says this:

         1   24   tic; 
         1   25   for i = 1 : count 
4.41    20   26     x = load(sprintf('temp_file_%06d', i), 'x'); 
        20   27     xs{i} = x.x; 
             28     
        20   29     file = fopen(sprintf('temp_logfile_%d', i), 'w'); 
        20   30     fprintf(file, 'Success\n'); 
        20   31     fclose(file); 
        20   32   end 
         1   33   toc

It's placed the entire time taken to execute the final three lines on the line for load. In my actual program, the load was not so close to the other bit so it didn't occur to me until I decided to distrust the profiler. My question is: what is going on here? Why has this happened and should I be watching out for any more bizarre behaviour like this?

I'm using MATLAB 2011a. Many thanks.

EDIT: I seem to be causing some confusion, apologies. Here is the situation:

  • The two for loops shown above are identical, except that the second one has three lines at the bottom which write to a temporary file each iteration.
  • The second loop takes substantially longer to run: the conclusion is that those last three lines are to blame for the speed increase. When they are removed, the code is fast again.
  • However, the profiler does not attribute any of the time for the second loop to those final three statements. Instead, it tells me that my load function call - exactly the same call as the first loop, which was faster - is now taking 4 seconds instead of 0.2. So either the presence of the last three lines causes the load to be slow (I had disregarded this; is that even a possibility?), OR the MATLAB profiler is incorrectly reporting that load is taking 4 seconds when it is clearly not.

Either way it seems to me that something very strange is happening.

EDIT: Seem to have answered it myself, see below. Changed the title as it was misleading


回答1:


I do not see any evidence of a bug in your post.

You mention that the entire loop takes about 4.111 and the profiler shows that line 26 takes about 4.11.

This means that all other lines together take less than 0.01 and therefore each line takes a rounded amount of zero seconds.

My guess is that zeroes are just not printed and that you interpreted this as the other lines not being timed.

I may be missing something but so far the output provided by MATLAB seems to be consistent.




回答2:


Actually, I think I've solved it. I was wrong to jump to the conclusion that the additional processing time was occurring on the new lines, so my question is now a little misleading - the profiler is correct. However, I still didn't understand why writing to a temporary file would cause load to slow down. I had a thought, which was to try this:

file = fopen(sprintf('../temp_logfile_%d', i), 'w');

That is, write to a file in the parent directory instead of the current working directory. This removed the problem, and was very fast. The reason, I am guessing, is that the current directory is in my MATLAB search path, as are a bunch of other directories. I presume that every time MATLAB uses a function which looks though the whole search path, as load does, it checks to see if any directories have been modified, and if so re-parses the whole lot to see what files are available. Writing a new file to the working directory certainly would have caused this. This may have been worse in my case since I also have a whole tree of subdirectories in the working directory which are part of the search path.

Anyway, thanks to those who had a look and sorry that the answer turned out to be something quite different from the question. Be aware when using functions which rely on the entire search path!




回答3:


I get the following report generated by the profiler of MATLAB 2012b, I dont see a bug.



来源:https://stackoverflow.com/questions/15386917/why-does-writing-to-an-unrelated-file-cause-the-load-function-to-be-so-slow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!