How to split an image datastore for cross-validation in MATLAB?

佐手、 提交于 2019-12-22 13:52:47

问题


In MATLAB the method splitEachLabelof an imageDatastore object splits an image data store into proportions per category label. How can one split an image data store for training using cross-validation and using the trainImageCategoryCalssifier class?

I.e. it's easy to split it in N partitions, but then some sort of _mergeEachLabel_ functionality is needed to be able to train a classifier using cross-validation.

Or is there another way of achieving that?

Regards, Elena


回答1:


I stumbled on the same issue recently. Not sure if there is anyone still looking for a possible solution to this.

I ended up creating a function to combine multiple imds into one (similar to your _mergeEachLabel_ suggestion).

According to MATLAB documentation, a imageDatastore is a structure with 4 fields

  1. a cell array with paths to the images
  2. a cell array with the labels of each image
  3. an integer specifying the number of images to read in each call of the reader
  4. a function that reads image data

So this function simply creates a new IMDS that concatenates the first and second fields of N different imds into this new one.

Then you could use this function to run a cross validation. If you have 5 folds (5 different imds) you could run a loop calling trainImageCategoryClassifier with that combines 4 folds into a training set and run evaluate on the remaining imds.

One caveat: after using this I realized that it is very inefficient to work this way because you would be reencoding the images into your bag of features every iteration of your CV loop. It would be more efficient to encode your whole IMDS once into a X matrix and then use fitsvm directly, where they have CV functionalities built in.

Anyway, if anyone is still interested in this question, here is my function:

function [newimds] = combineimds(cell_imds)
% COMBINEIMDS  Merges a set of IMDS together and returns the combined IMDS
%     CELL_IMDS is a 1xn cell array where each cell is a different IMDS object
%%
n = size(cell_imds, 2);      % assumes that cell_imds is 1xn

%%
% use function splitEachLabel to copy first fold to new imds
[newimds dummy] = splitEachLabel(cell_imds{1}, 1);
a = [newimds.Files; dummy.Files];
b = [newimds.Labels; dummy.Labels];
newimds.Files = a;
newimds.Labels = b;
%%
% concatenate cells in the new imds
for i = 2:n
  a = [newimds.Files; cell_imds{i}.Files];
  b = [newimds.Labels; cell_imds{i}.Labels];
  newimds.Files = a;
  newimds.Labels = b;
end

end

Hope it helps.




回答2:


The following code should work for basic cross validation, of course you will need to change the value of k and the datastore options appropriately.

k = 5; % number of folds
datastore = imageDatastore(fullfile('.'), 'IncludeSubfolders', true, 'LabelSource', 'foldernames');

partStores{k} = [];
for i = 1:k
   temp = partition(datastore, k, i);
   partStores{i} = temp.Files;
end

% this will give us some randomization
% though it is still advisable to randomize the data before hand
idx = crossvalind('Kfold', k, k);

for i = 1:k
    test_idx = (idx == i);
    train_idx = ~test_idx;

    test_Store = imageDatastore(partStores{test_idx}, 'IncludeSubfolders', true, 'LabelSource', 'foldernames');
    train_Store = imageDatastore(cat(1, partStores{train_idx}), 'IncludeSubfolders', true, 'LabelSource', 'foldernames');

    % do your training and predictions here, maybe pre-allocate them before the loop, too
    %net{i} = trainNetwork(train_Store, layers options);
    %pred{i} = classify(net, test_Store);
end


来源:https://stackoverflow.com/questions/42156963/how-to-split-an-image-datastore-for-cross-validation-in-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!