问题
I'm trying to find a Python method/library for testing correlation between the independent variables X and the binary output Y..
So for example, lets say I have the following data and output:
X Y
0.65 1
0.11 0
0.13 0
0.35 1
0.21 0
...
Lets say the output Y is 1 if (X > 0.3) and 0 otherwise. If I don't know this correlation (the threshold value 0.3), is there a statistical method/test to find out the degree of correlation between X and Y?
So for example, some method that returns
x = [0.65, 0.11, 0.13, 0.31, 0.21]
y = [1, 0, 0, 1, 0]
print some_test(x, y)
==> returns "degree of correlation = 1.0"
Thanks
回答1:
You are looking for a point biserial correlation, which is used when one of your variables is dichotomous.
from scipy import stats
stats.pointbiserialr(x,y)
If you simply want to know whether X is different depending on the value of Y, you should instead use a t-test.
来源:https://stackoverflow.com/questions/29021380/is-there-a-way-to-test-correlation-between-data-x-and-binary-output-y