Split the dataset into two subsets in matlab/octave [closed]

丶灬走出姿态 提交于 2021-02-17 07:19:04

问题


Split the dataset into two subsets, say, "train" and "test", with the train set containing 80% of the data and the test set containing the remaining 20%.

Splitting means to generate a logical index of length equal to the number of observations in the dataset, with 1 for a training sample and 0 for at test sample.

N=length(data.x)

Output: logical arrays called idxTrain and idxTest.


回答1:


This should do the trick:

% Generate sample data...
data = rand(32000,1);

% Calculate the number of training entries...
train_off = round(numel(data) * 0.8);

% Split data into training and test vectors...
train = data(1:train_off);
test = data(train_off+1:end);

But if you really want to rely on logical indexing, you can proceed as follows:

% Generate sample data...
data = rand(32000,1);
data_len = numel(data);

% Calculate the number of training entries...
train_count = round(data_len * 0.8);

% Create the logical indexing...
is_training = [true(train_count,1); false(data_len-train_count,1)];

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);

You can also go for the randsample function in order to achieve some randomness in your extractions, but this won't grant you an exact number of draws for test and training elements every time you run the script:

% Generate sample data...
data = rand(32000,1);

% Generate a random true/false indexing with unequally weighted probabilities...
is_training = logical(randsample([0 1],32000,true,[0.2 0.8]));

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);

You may avoid this problem by producing a correct number of test and training indices and then shuffling them using a randperm based indexing:

% Generate sample data...
data = rand(32000,1);
data_len = numel(data);

% Calculate the number of training entries...
train_count = round(data_len * 0.8);

% Create the logical indexing...
is_training = [true(train_count,1); false(data_len-train_count,1)];

% Shuffle the logical indexing...
is_training = is_training(randperm(32000));

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);


来源:https://stackoverflow.com/questions/49242812/split-the-dataset-into-two-subsets-in-matlab-octave

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!