Split the dataset into two subsets in matlab/octave [closed]

问题

Split the dataset into two subsets, say, "train" and "test", with the train set containing 80% of the data and the test set containing the remaining 20%.

Splitting means to generate a logical index of length equal to the number of observations in the dataset, with 1 for a training sample and 0 for at test sample.

N=length(data.x)

Output: logical arrays called idxTrain and idxTest.

回答1:

This should do the trick:

% Generate sample data...
data = rand(32000,1);

% Calculate the number of training entries...
train_off = round(numel(data) * 0.8);

% Split data into training and test vectors...
train = data(1:train_off);
test = data(train_off+1:end);

But if you really want to rely on logical indexing, you can proceed as follows:

% Generate sample data...
data = rand(32000,1);
data_len = numel(data);

% Calculate the number of training entries...
train_count = round(data_len * 0.8);

% Create the logical indexing...
is_training = [true(train_count,1); false(data_len-train_count,1)];

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);

You can also go for the randsample function in order to achieve some randomness in your extractions, but this won't grant you an exact number of draws for test and training elements every time you run the script:

% Generate sample data...
data = rand(32000,1);

% Generate a random true/false indexing with unequally weighted probabilities...
is_training = logical(randsample([0 1],32000,true,[0.2 0.8]));

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);

You may avoid this problem by producing a correct number of test and training indices and then shuffling them using a randperm based indexing:

% Generate sample data...
data = rand(32000,1);
data_len = numel(data);

% Calculate the number of training entries...
train_count = round(data_len * 0.8);

% Create the logical indexing...
is_training = [true(train_count,1); false(data_len-train_count,1)];

% Shuffle the logical indexing...
is_training = is_training(randperm(32000));

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);

来源：https://stackoverflow.com/questions/49242812/split-the-dataset-into-two-subsets-in-matlab-octave

标签

matlab

octave