Matlab: How to read in numbers with a comma as decimal separator?

前端未结

关注

 4  1570

I have a whole lot (hundreds of thousands) of rather large (>0.5MB) files, where data are numerical, but with a comma as decimal separator. It\'s impractical for me to use a

相关标签:

4条回答

天涯浪人

2020-12-04 00:20

With a test script I've found a factor of less than 1.5. My code would look like:

tmco = {'NumHeaderLines', 1      , ...
        'NumColumns'    , 5      , ...
        'ConvString'    , '%f'   , ...
        'InfoLevel'     , 0      , ...
        'ReadMode'      , 'block', ...
        'ReplaceChar'   , {',.'} } ;

A = txt2mat(filename, tmco{:});

Note the different 'ReplaceChar' value and 'ReadMode' 'block'.

I get the following results for a ~5MB file on my (not too new) machine:

txt2mat test comma avg. time: 0.63231
txt2mat test dot avg. time: 0.45715
textscan test dot avg. time: 0.4787

The full code of my test script:

%% generate sample files

fdot = 'C:\temp\cDot.txt';
fcom = 'C:\temp\cCom.txt';

c = 5;       % # columns
r = 100000;  % # rows
test = round(1e8*rand(r,c))/1e6;
tdot = sprintf([repmat('%f ', 1,c), '\r\n'], test.'); % '
tdot = ['a header line', char([13,10]), tdot];

tcom = strrep(tdot,'.',',');

% write dot file
fid = fopen(fdot,'w');
fprintf(fid, '%s', tdot);
fclose(fid);
% write comma file
fid = fopen(fcom,'w');
fprintf(fid, '%s', tcom);
fclose(fid);

disp('-----')

%% read back sample files with txt2mat and textscan

% txt2mat-options with comma decimal sep.
tmco = {'NumHeaderLines', 1      , ...
        'NumColumns'    , 5      , ...
        'ConvString'    , '%f'   , ...
        'InfoLevel'     , 0      , ...
        'ReadMode'      , 'block', ...
        'ReplaceChar'   , {',.'} } ;

% txt2mat-options with dot decimal sep.
tmdo = {'NumHeaderLines', 1      , ...
        'NumColumns'    , 5      , ...
        'ConvString'    , '%f'   , ...
        'InfoLevel'     , 0      , ...
        'ReadMode'      , 'block'} ;

% textscan-options
tsco = {'HeaderLines'   , 1      , ...
        'CollectOutput' , true   } ;


A = txt2mat(fcom, tmco{:});
B = txt2mat(fdot, tmdo{:});

fid = fopen(fdot);
C = textscan(fid, repmat('%f',1,c) , tsco{:} );
fclose(fid);
C = C{1};

disp(['txt2mat  test comma (1=Ok): ' num2str(isequal(A,test)) ])
disp(['txt2mat  test dot   (1=Ok): ' num2str(isequal(B,test)) ])
disp(['textscan test dot   (1=Ok): ' num2str(isequal(C,test)) ])
disp('-----')

%% speed test

numTest = 20;

% A) txt2mat with comma
tic
for k = 1:numTest
    A = txt2mat(fcom, tmco{:});
    clear A
end
ttmc = toc;
disp(['txt2mat  test comma avg. time: ' num2str(ttmc/numTest) ])

% B) txt2mat with dot
tic
for k = 1:numTest
    B = txt2mat(fdot, tmdo{:});
    clear B
end
ttmd = toc;
disp(['txt2mat  test dot   avg. time: ' num2str(ttmd/numTest) ])

% C) textscan with dot
tic
for k = 1:numTest
    fid = fopen(fdot);
    C = textscan(fid, repmat('%f',1,c) , tsco{:} );
    fclose(fid);
    C = C{1};
    clear C
end
ttsc = toc;
disp(['textscan test dot   avg. time: ' num2str(ttsc/numTest) ])
disp('-----')

0 讨论(0)

夕颜

2020-12-04 00:25

My solution (assumes commas are only used as decimal place holders and that white space delineates columns):

fid = fopen("FILENAME");
indat = fread(fid, '*char');
fclose(fid);
indat = strrep(indat, ',', '.');
[colA, colB] = strread(indat, '%f %f');

If you should happen to need to remove a single header line, as I did, then this should work:

fid = fopen("FILENAME");                  %Open file
indat = fread(fid, '*char');              %Read in the entire file as characters
fclose(fid);                              %Close file
indat = strrep(indat, ',', '.');          %Replace commas with periods
endheader=strfind(indat,13);              %Find first newline
indat=indat(endheader+1:size(indat,2));   %Extract all characters after first new line
[colA, colB] = strread(indat, '%f %f');   %Convert string to numerical data

0 讨论(0)

礼貌的吻别

2020-12-04 00:30

You may try to speed up txt2mat by also adding the number of header lines, and, if possible, the number of columns as inputs to bypass its file analysis. There shouldn't be a factor of 25 compared to a textscan import with dot-separated decimals then. (You may also contact me using the author page on the mathworks site.) Please let us know if you find a more efficient way to handle comma-separated decimals in matlab.

0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2020-12-04 00:31
You may use txt2mat.
```
A = txt2mat('data.txt');
```
It will handle the data automatically. But you can explicitly say:
```
A = txt2mat('data.txt','ReplaceChar',',.');
```
P.S. It may not be efficient, but you can copy the part from the source file if you need it only for your specific data formats.
0 讨论(0)
发布评论:

提交评论
- 加载中...