Data augmentation techniques for general datasets?

左心房为你撑大大i 提交于 2020-12-30 05:28:26

问题


I am working in a machine learning problem and want to build neural network based classifiers on it in matlab. One problem is that the data is given in the form of features and number of samples is considerably lower. I know about data augmentation techniques for images, by rotating, translating, affine translation, etc.

I would like to know whether there are data augmentation techniques available for general datasets ? Like is it possible to use randomness to generate more data ? I read the answer here but I did not understand it.

Kindly please provide answers with the working details if possible.

Any help will be appreciated.


回答1:


You need to look into autoencoders. Effectively you pass your data into a low level neural network, it applies a PCA-like analysis, and you can subsequently use it to generate more data.

Matlab has an autoencoder class as well as a function, that will do all of this for you. From the matlab help files

Generate the training data.

rng(0,'twister'); % For reproducibility
n = 1000;
r = linspace(-10,10,n)';
x = 1 + r*5e-2 + sin(r)./r + 0.2*randn(n,1);

Train autoencoder using the training data.

hiddenSize = 25;
autoenc = trainAutoencoder(x',hiddenSize,...
        'EncoderTransferFunction','satlin',...
        'DecoderTransferFunction','purelin',...
        'L2WeightRegularization',0.01,...
        'SparsityRegularization',4,...
        'SparsityProportion',0.10);

Generate the test data.

n = 1000;
r = sort(-10 + 20*rand(n,1));
xtest = 1 + r*5e-2 + sin(r)./r + 0.4*randn(n,1);

Predict the test data using the trained autoencoder, autoenc .

xReconstructed = predict(autoenc,xtest');

Plot the actual test data and the predictions.

figure;
plot(xtest,'r.');
hold on
plot(xReconstructed,'go');

You can see the green cicrles which represent additional data generated with the auto-encoder.



来源:https://stackoverflow.com/questions/39265746/data-augmentation-techniques-for-general-datasets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!