determining “how good” a correlation is in matlab?

☆樱花仙子☆ 提交于 2019-12-08 03:27:50

问题


I'm working with a set of data and I've obtained a certain correlations (using pearson's correlation coefficient). I've been asked to determine the "quality of the correlation," and by that my supervisor means he wants to see what the correlations would be if I tried permuting all the y values of my ordered pairs, and compared the obtained correlation coefficients. Does anyone know a nice way of doing this? Is there a matlab function that would determine how good a correlation is when compared to a correlation between random permutations of the data?


回答1:


First, you have to check whether the correlation coefficient you got is significantly different from zero. The corr function can do this (see pval).

Second, if it's significantly different from zero, then you would like to decide whether this difference is also significant from a practical point of view. In practice, the square of the correlation coefficent (the coefficient of determination) is considered significant, if it's larger than 0.5, which means that the variations of one of the correlated parameters "explains" at least 50% of the variation of the other.

Third, there are cases when the coefficient of determination is close to one, but this is not enough to determine the "goodness of correlation". For example, if you measure the same variable using two different methods, you will usually get very similar values, so the correlation coefficient will be almost 1. In such cases you should apply the Bland-Altman analysis, which is very easy to implement in Matlab, and has its own "goodness" parameters (the bias and the so-called limits of agreement).




回答2:


You can permute one vector's labels N times and calculate coefficient of correlations (cc) for each iteration. Then you can compare distribution of those values with the real correlation.

Something like this:

%# random data
n = 20;
x = (1:n)';
y = x + randn(n,1)*3;

%# real correlation
cc = corr(x,y);

%# do permutations
n_iter = 100; %# number of permutations
cc_iter = zeros(n_iter,1); %# preallocate the vector
for k = 1:n_iter
    ind = randperm(n); %# vector of random permutations
    cc_iter(k) = corr(x,y(ind));
end

%# calculate statistics
cc_mean = mean(cc_iter);
cc_std = std(cc_iter);
zval = cc - cc_mean ./ cc_std;
%# probability that the real cc belongs to the same distribution as cc from permuted data
pv = 2 * normcdf(-abs(zval),cc_mean,cc_std); 

%# plot
hist(cc_iter,20)
line([cc cc],ylim,'color','r') %# real value

In addition, if you compute correlation with [cc pv] = corr(x,y), you get p-value of how your correlation is different from no correlation. This p-value is calculated from assumption that your vector distributed normally. However, if you calculate not Pearson, but Spearman or Kendall correlation (non-parametric), those p-values will be from randomly permuted data:

[cc pv] = corr(x,y,'type','Spearman')


来源:https://stackoverflow.com/questions/8416968/determining-how-good-a-correlation-is-in-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!