FCM Clustering numeric data and csv/excel file

后端 未结 1 1356
你的背包
你的背包 2020-12-21 19:01

Hi I asked a previous question that gave a reasonable answer and I thought I was back on track, Fuzzy c-means tcp dump clustering in matlab the problem is the preprocessing

相关标签:
1条回答
  • 2020-12-21 19:36

    Here is an example how I would read the data into MATLAB. You need two things: the data itself which is in comma-separated format, as well as the list of features along with their types (numeric,nominal).

    %# read the list of features
    fid = fopen('kddcup.names','rt');
    C = textscan(fid, '%s %s', 'Delimiter',':', 'HeaderLines',1);
    fclose(fid);
    
    %# determine type of features
    C{2} = regexprep(C{2}, '.$','');              %# remove "." at the end
    attribNom = [ismember(C{2},'symbolic');true]; %# nominal features
    
    %# build format string used to read/parse the actual data
    frmt = cell(1,numel(C{1}));
    frmt( ismember(C{2},'continuous') ) = {'%f'}; %# numeric features: read as number
    frmt( ismember(C{2},'symbolic') ) = {'%s'};   %# nominal features: read as string
    frmt = [frmt{:}];
    frmt = [frmt '%s'];                           %# add the class attribute
    
    %# read dataset
    fid = fopen('kddcup.data','rt');
    C = textscan(fid, frmt, 'Delimiter',',');
    fclose(fid);
    
    %# convert nominal attributes to numeric
    ind = find(attribNom);
    G = cell(numel(ind),1);
    for i=1:numel(ind)
        [C{ind(i)},G{i}] = grp2idx( C{ind(i)} );
    end
    
    %# all numeric dataset
    M = cell2mat(C);
    

    You could also look into the DATASET class from the Statistics Toolbox.

    0 讨论(0)
提交回复
热议问题