Is there a way to test correlation between Data X and Binary output Y?

蓝咒 提交于 2019-12-22 05:31:30

问题


I'm trying to find a Python method/library for testing correlation between the independent variables X and the binary output Y..

So for example, lets say I have the following data and output:

X           Y
0.65       1
0.11       0
0.13       0
0.35       1
0.21       0
...

Lets say the output Y is 1 if (X > 0.3) and 0 otherwise. If I don't know this correlation (the threshold value 0.3), is there a statistical method/test to find out the degree of correlation between X and Y?

So for example, some method that returns

x = [0.65, 0.11, 0.13, 0.31, 0.21]
y = [1, 0, 0, 1, 0]
print some_test(x, y)

==> returns "degree of correlation = 1.0"

Thanks


回答1:


You are looking for a point biserial correlation, which is used when one of your variables is dichotomous.

from scipy import stats
stats.pointbiserialr(x,y)

If you simply want to know whether X is different depending on the value of Y, you should instead use a t-test.



来源:https://stackoverflow.com/questions/29021380/is-there-a-way-to-test-correlation-between-data-x-and-binary-output-y

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!