Data augmentation techniques for general datasets?

问题

I am working in a machine learning problem and want to build neural network based classifiers on it in matlab. One problem is that the data is given in the form of features and number of samples is considerably lower. I know about data augmentation techniques for images, by rotating, translating, affine translation, etc.

I would like to know whether there are data augmentation techniques available for general datasets ? Like is it possible to use randomness to generate more data ? I read the answer here but I did not understand it.

Kindly please provide answers with the working details if possible.

Any help will be appreciated.

回答1:

You need to look into autoencoders. Effectively you pass your data into a low level neural network, it applies a PCA-like analysis, and you can subsequently use it to generate more data.

Matlab has an autoencoder class as well as a function, that will do all of this for you. From the matlab help files

Generate the training data.

rng(0,'twister'); % For reproducibility
n = 1000;
r = linspace(-10,10,n)';
x = 1 + r*5e-2 + sin(r)./r + 0.2*randn(n,1);

Train autoencoder using the training data.

hiddenSize = 25;
autoenc = trainAutoencoder(x',hiddenSize,...
        'EncoderTransferFunction','satlin',...
        'DecoderTransferFunction','purelin',...
        'L2WeightRegularization',0.01,...
        'SparsityRegularization',4,...
        'SparsityProportion',0.10);

Generate the test data.

n = 1000;
r = sort(-10 + 20*rand(n,1));
xtest = 1 + r*5e-2 + sin(r)./r + 0.4*randn(n,1);

Predict the test data using the trained autoencoder, autoenc .

xReconstructed = predict(autoenc,xtest');

Plot the actual test data and the predictions.

figure;
plot(xtest,'r.');
hold on
plot(xReconstructed,'go');

You can see the green cicrles which represent additional data generated with the auto-encoder.

来源：https://stackoverflow.com/questions/39265746/data-augmentation-techniques-for-general-datasets

标签

matlab

deep-learning