manipulate data to better fit a Gaussian Distribution

旧街凉风 提交于 2019-12-01 17:59:13

问题


I have got a question concerning normal distribution (with mu = 0 and sigma = 1).

Let say that I firstly call randn or normrnd this way

x = normrnd(0,1,[4096,1]); % x = randn(4096,1)

Now, to assess how good x values fit the normal distribution, I call

[a,b] = normfit(x);

and to have a graphical support

histfit(x)

Now come to the core of the question: if I am not satisfied enough on how x fits the given normal distribution, how can I optimize x in order to better fit the expected normal distribution with 0 mean and 1 standard deviation?? Sometimes because of the few representation values (i.e. 4096 in this case), x fits really poorly the expected Gaussian, so that I wanna manipulate x (linearly or not, it does not really matter at this stage) in order to get a better fitness.

I'd like remarking that I have access to the statistical toolbox.

EDIT

  1. I made the example with normrnd and randn cause my data are supposed and expected to have normal distribution. But, within the question, those functions are only helpful to better understand my concern.

  2. Would it be possible to appy a least-squares fitting?

  3. Generally the distribution I get is similar to the following:

My


回答1:


Maybe, you can try to normalize your input data to have mean=0 and sigma=1. Like this:

y=(x-mean(x))/std(x);



回答2:


If you are searching for a nonlinear transformation that would make your distribution look normal, you can first estimate the cumulative distribution, then take the function composition with the inverse of standard normal CDF. This way you can transform almost any distribution to a normal through invertible transformation. Take a look at the example code below.

x = randn(1000, 1) + 4 * (rand(1000, 1) < 0.5); % some funky bimodal distribution
xr = linspace(-5, 9, 2000);
cdf = cumsum(ksdensity(x, xr, 'width', 0.5)); cdf = cdf / cdf(end); % you many want to use a better smoother
c = interp1(xr, cdf, x); % function composition step 1
y = norminv(c); % function composition step 2
% take a look at the result
figure;
subplot(2,1,1); hist(x, 100);
subplot(2,1,2); hist(y, 100);


来源:https://stackoverflow.com/questions/15496804/manipulate-data-to-better-fit-a-gaussian-distribution

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!