MATLAB: 10 fold cross Validation without using existing functions

前端 未结 2 2227
难免孤独
难免孤独 2020-12-30 17:30

I have a matrix (I guess in MatLab you call it a struct) or data structure:

  data: [150x4 double]
labels: [150x1 double]

here is out my ma

2条回答
  •  太阳男子
    2020-12-30 17:51

    Hahaha sorry, no solution. Don't have MATLAB on me right now so can't check code for errors. But here's the general idea:

    1. Generate k (in your case 10) subsamples
      1. Start two counters at 1 and preallocate new matrix: index = 1; subsample = 1; newmat = zeros("150","6") < 150 is the number of samples, 6 = 4 wide data + 1 wide labels + 1 we will use later
      2. While you still have data: while ( length(labels) > 0 )
      3. Generate a random number within the amount of data left: randNum = randi(length(labels))? I think that's a random int that goes from 1 to the size of your labels array (it could be 0, please check the doc - if it is, do simple math to make it 1 < rand < length)
      4. Add that row to a new data set with labels: newmat(index,:) = [data(randNum,:) labels(randNum) subsample] < that last column is the subsample number from 1-10
      5. Delete the row from data and labels: data(randNum,:) = []; same for labels < note this will physically remove a row from the matrices, which is why we have to use a while loop and check for length > 0 rather than a for loop and simple indices
      6. Increment counters: index = index + 1; subsample = subsample + 1;
      7. if subsample = 11, make it 1 again.

    At the end of this, you should have a large data matrix that looks almost exactly like your original, but has randomly assigned "fold labels".

    1. Loop over all this and your executing code k (10) times.

    EDIT: code placed in more accessible manner. NOTE it's still pseudo-y code and is not complete! Also, you should note that this is NOT AT ALL the most efficient way, but shouldn't be too bad if you can't use matlab functions.

    for k = 1:10
    
    index = 1; subsample = 1; newmat = zeros("150","6");
    while ( length(labels) > 0 )
        randNum = randi(length(labels));
        newmat(index,:) = [data(randNum,:) labels(randNum) subsample];
        data(randNum,:) = []; same for labels
        index = index + 1; subsample = subsample + 1;
        if ( subsample == 11 )
            subsample = 1;
        end
    end
    
    % newmat is complete, now run code here using the sampled data 
    %(ie pick a random number from 1:10 and use that as your validation fold. the rest for training
    
    end
    

    EDIT FOR ANSWER #2:

    Ok another way, is to create a vector that is as long as your data set

    foldLabels = zeros("150",1);
    

    Then, looping for that long (150), assign labels to random indices!

    foldL = 1;
    numAssigned = 0;
    while ( numAssigned < 150 )
        idx = randi(150);
        % no need to reassign a given label, so check if is still 0
        if ( foldLabels(idx) == 0 )
            foldLabels(idx) = foldL;
            numAssigned++; % not matlab code, just got lazy. you get it
            foldL++;
            if ( foldL > 10 )
                foldL = 1;
            end
        end
    end
    

    EDIT FOR ANSWER #2.5

    foldLabels = zeros("150",1);
    for i = 1:150
        notChosenLabels = [notChosenLabels i];
    end
    foldL = 1;
    numAssigned = 0;
    while ( length(notChosenLabels) > 0 )
        labIdx = randi(length(notChosenLabels));
        idx = notChosenLabels(labIdx);
        foldLabels(idx) = foldL;
        numAssigned++; % not matlab code, just got lazy. you get it
        foldL++;
        if ( foldL > 10 )
            foldL = 1;
        end
        notChosenLabels(labIdx) = [];
    end
    

    EDIT FOR RANDPERM

    generate the indices with randperm

    idxs = randperm(150);
    

    now just assign

    foldLabels = zeros(150,1);
    for i = 1:150
        foldLabels(idxs(i)) = sampleLabel;
        sampleLabel = sampleLabel + 1;
        if ( sampleLabel > 10 )
           sampleLabel = 1;
        end
    end
    

提交回复
热议问题