Is there a way in Matlab to determine the number of lines in a file without looping through each line?

后端 未结 5 1669
無奈伤痛
無奈伤痛 2020-12-05 00:57

Obviously one could loop through a file using fgetl or similar function and increment a counter, but is there a way to determine the number of lines in a file without

相关标签:
5条回答
  • 2020-12-05 00:59

    I like to use the following code for exactly this task

    fid = fopen('someTextFile.txt', 'rb');
    %# Get file size.
    fseek(fid, 0, 'eof');
    fileSize = ftell(fid);
    frewind(fid);
    %# Read the whole file.
    data = fread(fid, fileSize, 'uint8');
    %# Count number of line-feeds and increase by one.
    numLines = sum(data == 10) + 1;
    fclose(fid);
    

    It is pretty fast if you have enough memory to read the whole file at once. It should work for both Windows- and Linux-style line endings.

    Edit: I measured the performance of the answers provided so far. Here is the result for determining the number of lines of a text file containing 1 million double values (one value per line). Average of 10 tries.

     Author           Mean time +- standard deviation (s)
    ------------------------------------------------------
     Rody Oldenhuis      0.3189 +- 0.0314
     Edric (2)           0.3282 +- 0.0248
     Mehrwolf            0.4075 +- 0.0178
     Jonas               1.0813 +- 0.0665
     Edric (1)          26.8825 +- 0.6790
    

    So fastest are the approaches using Perl and reading all the file as binary data. I would not be surprised, if Perl internally also read large blocks of the file at once instead of looping through it line by line (just a guess, do not know anything about Perl).

    Using a simple fgetl()-loop is by a factor of 25-75 slower than the other approaches.

    Edit 2: Included Edric's 2nd approach, which is much faster and on-par with the Perl solution, I'd say.

    0 讨论(0)
  • 2020-12-05 00:59

    I would recommend using an external tool for this. For example an app called cloc, which you can download here for free.

    On linux you then simply type cloc <repository path> and get

    YourPC$ cloc <directory_path>
          87 text files.
          81 unique files.                              
          23 files ignored.
    
    http://cloc.sourceforge.net v 1.60  T=0.19 s (311.7 files/s, 51946.9 lines/s)
    -------------------------------------------------------------------------------
    Language                     files          blank        comment           code
    -------------------------------------------------------------------------------
    MATLAB                          59           1009           1074           4993
    HTML                             1              0              0             23
    -------------------------------------------------------------------------------
    SUM:                            60           1009           1074           5016
    -------------------------------------------------------------------------------
    

    They also claim it should work on windows.

    0 讨论(0)
  • 2020-12-05 01:06

    I found a nice trick here:

    if (isunix) %# Linux, mac
        [status, result] = system( ['wc -l ', 'your_file'] );
        numlines = str2num(result);
    
    elseif (ispc) %# Windows
        numlines = str2num( perl('countlines.pl', 'your_file') );
    
    else
        error('...');
    
    end
    

    where 'countlines.pl' is a perl script, containing

    while (<>) {};
    print $.,"\n";
    
    0 讨论(0)
  • 2020-12-05 01:10

    I think a loop is in fact the best - all other options so far suggested either rely on external programs (need to error-check; need str2num; harder to debug / run cross-platform etc.) or read the whole file in one go. Loops aren't so bad. Here's my variant

    function count = countLines(fname)
      fh = fopen(fname, 'rt');
      assert(fh ~= -1, 'Could not read: %s', fname);
      x = onCleanup(@() fclose(fh));
      count = 0;
      while ischar(fgetl(fh))
        count = count + 1;
      end
    end
    

    EDIT: Jonas rightly points out that the above loop is really slow. Here's a faster version.

    function count = countLines(fname)
    fh = fopen(fname, 'rt');
    assert(fh ~= -1, 'Could not read: %s', fname);
    x = onCleanup(@() fclose(fh));
    count = 0;
    while ~feof(fh)
        count = count + sum( fread( fh, 16384, 'char' ) == char(10) );
    end
    end
    

    It's still not as fast as wc -l, but it's not a disaster either.

    0 讨论(0)
  • 2020-12-05 01:16

    You can read the entire file at once, and then count how many lines you've read.

    fid = fopen('yourFile.ext');
    
    allText = textscan(fid,'%s','delimiter','\n');
    
    numberOfLines = length(allText{1});
    
    fclose(fid)
    
    0 讨论(0)
提交回复
热议问题