Transform data to fit normal distribution

♀尐吖头ヾ 提交于 2019-12-06 03:23:54

May be what you are interested in is rank-based inverse normal transformation. Basically you rank the data first an them convert it to normal distribution:

rank = tiedrank( data );
p = rank / ( length(rank) + 1 ); %# +1 to avoid Inf for the max point
newdata = norminv( p, 0, 1 );

What you are trying to do seems to match the problem of trying to find how random a set of data is. Supergaussian pdfs are those which have a greater probability around zero (or the mean, whatever it may be) than the Gaussian distribution, and are consequently more "sharply peaked" - much like your example. An example of this type of distribution is the Laplace distribution. Subgaussian pdfs are the opposite.

A measure of a dataset's closeness to the Gaussian distribution can be given in many ways... often this is done by using either the fourth-order moment, kurtosis (http://en.wikipedia.org/wiki/Kurtosis - MATLAB function kurt), or an information-theoretic measure such as negentropy (http://en.wikipedia.org/wiki/Negentropy ). Kurtosis is a bit dodgy if you have lots of outliers because the error gets raised to the power of 4, so negentropy is better.

If you don't understand the term "fourth-order moment", read a statistics textbook.

A comparison of these, and several other, measures of randomness (Gaussianity) is given in many texts on independent component analysis (ICA), as it is a core concept. A good resource on this is the book Independent Component Analysis, by Hyvarinen and Oja - http://books.google.co.uk/books/about/Independent_Component_Analysis.html?id=96D0ypDwAkkC .

I haven't been able to really understand what this question, or your other recent similar ones, have been asking exactly.

Perhaps you have data that is normally distributed, and you want to make it be normally distributed with mean 0 and standard deviation 1?

If so, then subtract mu from your data and divide it by sigma, where mu is the mean of the data and sigma is its standard deviation. If your original data is normally distributed, then the result should be data that is normally distributed with mean 0 and standard deviation 1.

There's a function zscore in Statistics Toolbox to do exactly this for you.

But perhaps you meant something else?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!