Matlab: How to read in numbers with a comma as decimal separator?

前端 未结 4 1543
礼貌的吻别
礼貌的吻别 2020-12-03 23:47

I have a whole lot (hundreds of thousands) of rather large (>0.5MB) files, where data are numerical, but with a comma as decimal separator. It\'s impractical for me to use a

相关标签:
4条回答
  • 2020-12-04 00:20

    With a test script I've found a factor of less than 1.5. My code would look like:

    tmco = {'NumHeaderLines', 1      , ...
            'NumColumns'    , 5      , ...
            'ConvString'    , '%f'   , ...
            'InfoLevel'     , 0      , ...
            'ReadMode'      , 'block', ...
            'ReplaceChar'   , {',.'} } ;
    
    A = txt2mat(filename, tmco{:});
    

    Note the different 'ReplaceChar' value and 'ReadMode' 'block'.

    I get the following results for a ~5MB file on my (not too new) machine:

    • txt2mat test comma avg. time: 0.63231
    • txt2mat test dot avg. time: 0.45715
    • textscan test dot avg. time: 0.4787

    The full code of my test script:

    %% generate sample files
    
    fdot = 'C:\temp\cDot.txt';
    fcom = 'C:\temp\cCom.txt';
    
    c = 5;       % # columns
    r = 100000;  % # rows
    test = round(1e8*rand(r,c))/1e6;
    tdot = sprintf([repmat('%f ', 1,c), '\r\n'], test.'); % '
    tdot = ['a header line', char([13,10]), tdot];
    
    tcom = strrep(tdot,'.',',');
    
    % write dot file
    fid = fopen(fdot,'w');
    fprintf(fid, '%s', tdot);
    fclose(fid);
    % write comma file
    fid = fopen(fcom,'w');
    fprintf(fid, '%s', tcom);
    fclose(fid);
    
    disp('-----')
    
    %% read back sample files with txt2mat and textscan
    
    % txt2mat-options with comma decimal sep.
    tmco = {'NumHeaderLines', 1      , ...
            'NumColumns'    , 5      , ...
            'ConvString'    , '%f'   , ...
            'InfoLevel'     , 0      , ...
            'ReadMode'      , 'block', ...
            'ReplaceChar'   , {',.'} } ;
    
    % txt2mat-options with dot decimal sep.
    tmdo = {'NumHeaderLines', 1      , ...
            'NumColumns'    , 5      , ...
            'ConvString'    , '%f'   , ...
            'InfoLevel'     , 0      , ...
            'ReadMode'      , 'block'} ;
    
    % textscan-options
    tsco = {'HeaderLines'   , 1      , ...
            'CollectOutput' , true   } ;
    
    
    A = txt2mat(fcom, tmco{:});
    B = txt2mat(fdot, tmdo{:});
    
    fid = fopen(fdot);
    C = textscan(fid, repmat('%f',1,c) , tsco{:} );
    fclose(fid);
    C = C{1};
    
    disp(['txt2mat  test comma (1=Ok): ' num2str(isequal(A,test)) ])
    disp(['txt2mat  test dot   (1=Ok): ' num2str(isequal(B,test)) ])
    disp(['textscan test dot   (1=Ok): ' num2str(isequal(C,test)) ])
    disp('-----')
    
    %% speed test
    
    numTest = 20;
    
    % A) txt2mat with comma
    tic
    for k = 1:numTest
        A = txt2mat(fcom, tmco{:});
        clear A
    end
    ttmc = toc;
    disp(['txt2mat  test comma avg. time: ' num2str(ttmc/numTest) ])
    
    % B) txt2mat with dot
    tic
    for k = 1:numTest
        B = txt2mat(fdot, tmdo{:});
        clear B
    end
    ttmd = toc;
    disp(['txt2mat  test dot   avg. time: ' num2str(ttmd/numTest) ])
    
    % C) textscan with dot
    tic
    for k = 1:numTest
        fid = fopen(fdot);
        C = textscan(fid, repmat('%f',1,c) , tsco{:} );
        fclose(fid);
        C = C{1};
        clear C
    end
    ttsc = toc;
    disp(['textscan test dot   avg. time: ' num2str(ttsc/numTest) ])
    disp('-----')
    
    0 讨论(0)
  • 2020-12-04 00:25

    My solution (assumes commas are only used as decimal place holders and that white space delineates columns):

    fid = fopen("FILENAME");
    indat = fread(fid, '*char');
    fclose(fid);
    indat = strrep(indat, ',', '.');
    [colA, colB] = strread(indat, '%f %f');
    

    If you should happen to need to remove a single header line, as I did, then this should work:

    fid = fopen("FILENAME");                  %Open file
    indat = fread(fid, '*char');              %Read in the entire file as characters
    fclose(fid);                              %Close file
    indat = strrep(indat, ',', '.');          %Replace commas with periods
    endheader=strfind(indat,13);              %Find first newline
    indat=indat(endheader+1:size(indat,2));   %Extract all characters after first new line
    [colA, colB] = strread(indat, '%f %f');   %Convert string to numerical data
    
    0 讨论(0)
  • 2020-12-04 00:30

    You may try to speed up txt2mat by also adding the number of header lines, and, if possible, the number of columns as inputs to bypass its file analysis. There shouldn't be a factor of 25 compared to a textscan import with dot-separated decimals then. (You may also contact me using the author page on the mathworks site.) Please let us know if you find a more efficient way to handle comma-separated decimals in matlab.

    0 讨论(0)
  • 2020-12-04 00:31

    You may use txt2mat.

    A = txt2mat('data.txt');
    

    It will handle the data automatically. But you can explicitly say:

    A = txt2mat('data.txt','ReplaceChar',',.');
    

    P.S. It may not be efficient, but you can copy the part from the source file if you need it only for your specific data formats.

    0 讨论(0)
提交回复
热议问题