correlation

partial correlation coefficient in pandas dataframe python

二次信任 提交于 2019-12-22 07:59:26
问题 I have a data in pandas dataframe like: df = X1 X2 X3 Y 0 1 2 10 5.077 1 2 2 9 32.330 2 3 3 5 65.140 3 4 4 4 47.270 4 5 2 9 80.570 and I want to do multiple regression analysis. Here Y is dependent variables and x1, x2 and x3 are independent variables. correlation between each independent variables with dependent variable is: df.corr(): X1 X2 X3 Y X1 1.000000 0.353553 -0.409644 0.896626 X2 0.353553 1.000000 -0.951747 0.204882 X3 -0.409644 -0.951747 1.000000 -0.389641 Y 0.896626 0.204882 -0

Is there a way to test correlation between Data X and Binary output Y?

蓝咒 提交于 2019-12-22 05:31:30
问题 I'm trying to find a Python method/library for testing correlation between the independent variables X and the binary output Y.. So for example, lets say I have the following data and output: X Y 0.65 1 0.11 0 0.13 0 0.35 1 0.21 0 ... Lets say the output Y is 1 if (X > 0.3) and 0 otherwise. If I don't know this correlation (the threshold value 0.3), is there a statistical method/test to find out the degree of correlation between X and Y? So for example, some method that returns x = [0.65, 0

How is NaN handled in Pearson correlation user-user similarity matrix in a recommender system?

╄→гoц情女王★ 提交于 2019-12-22 05:29:50
问题 I am generating a user-user similarity matrix from a user-rating data (particularly MovieLens100K data). Computing correlation leads to some NaN values. I have tested in a smaller dataset: User-Item rating matrix I1 I2 I3 I4 U1 4 0 5 5 U2 4 2 1 0 U3 3 0 2 4 U4 4 4 0 0 User-User Pearson Correlation similarity matrix U1 U2 U3 U4 U5 U1 1 -1 0 -nan 0.755929 U2 -1 1 1 -nan -0.327327 U3 0 1 1 -nan 0.654654 U4 -nan -nan -nan -nan -nan U5 0.755929 -0.327327 0.654654 -nan 1 For computing the pearson

Similarity between two data sets or arrays

不打扰是莪最后的温柔 提交于 2019-12-22 05:23:11
问题 Let's say I have a dataset that look like this: {A:1, B:3, C:6, D:6} I also have a list of other sets to compare my specific set: {A:1, B:3, C:6, D:6}, {A:2, B:3, C:6, D:6}, {A:99, B:3, C:6, D:6}, {A:5, B:1, C:6, D:9}, {A:4, B:2, C:2, D:6} My entries could be visualized as a Table (with four columns, A, B, C, D, and E). How can I find the set with the most similarity? For this example, row 1 is a perfect match and row 2 is a close second, while row 3 is quite far away. I am thinking of

Pearson's Coefficient and Covariance calculation in Matlab

允我心安 提交于 2019-12-22 05:02:19
问题 I want to calculate Pearson's correlation coefficent in Matlab (without using Matlab's corr function). Simply, I have two vectors A and B (each of them is 1x100) and I am trying to calculate the Pearson's coefficient like this: P = cov(x, y)/std(x, 1)std(y,1) I am using Matlab's cov and std functions. What I don't get is, the cov function returns me a square matrix like this: corrAB = 0.8000 0.2000 0.2000 4.8000 But I expect a single number as the covariance so I can come up with a single P

Calculate special correlation distance matrix faster

爱⌒轻易说出口 提交于 2019-12-21 22:45:28
问题 I would like to build a distance matrix using Pearson correlation distance. I first tried the scipy.spatial.distance.pdist(df,'correlation') which is very fast for my 5000 rows * 20 features dataset. Since I want to build a recommender, I wanted to slightly change the distance, only considering features which are distinct for NaN for both users. Indeed, scipy.spatial.distance.pdist(df,'correlation') output NaN when it meets any feature whose value is float('nan'). Here is my code, df being my

Bootstrapped p-value for a correlation coefficient on R

陌路散爱 提交于 2019-12-21 22:39:25
问题 On R , I used the boostrap method to get a correlation coefficient estimation and the confidence intervals. To get the p-value, I thought, I can calculate the proportion of the confidence intervals which do not contain zero. But this is not the solution. How can I get the p-value in this case ? I am using cor.test to get the coefficient estimation. cor.test may also gives me the p-value from every test. But how can I get the bootstrapped p-value ? Thank you very much ! Below an example : n=30

Generating two correlated random vectors

风格不统一 提交于 2019-12-21 22:00:19
问题 I want to generate two random vectors with a specified correlation. Each element of the second vector must be correlated with the corresponding element of the first vector and independent of others. How could I do this in MATLAB? By the way the elements of the first vector dont have the same distribution, I mean each element of the first vector should have different variances. (the vector is made of 7 variable with different variances. 回答1: As described in this Mathworks article, you can do

Objective C - Cross-correlation for audio delay estimation

两盒软妹~` 提交于 2019-12-21 11:02:24
问题 I would like to know if anyone knows how to perform a cross-correlation between two audio signals on iOS . I would like to align the FFT windows that I get at the receiver (I am receiving the signal from the mic) with the ones at the transmitter (which is playing the audio track), i.e. make sure that the first sample of each window (besides a "sync" period) at the transmitter will also be the first window at the receiver. I injected in every chunk of the transmitted audio a known waveform (in

How to find correlation between two integer arrays in java

…衆ロ難τιáo~ 提交于 2019-12-21 07:57:23
问题 I am searching a lot but could not find exactly what i need till now. I have two integer arrayas int[] x and int[] y . I want to find simple linear correlation between these two integer arrays and it should return the result as double . In java do you know any library function providing this or any code snippet? 回答1: Correlation is quite easy to compute manually: http://en.wikipedia.org/wiki/Correlation_and_dependence public static double Correlation(int[] xs, int[] ys) { //TODO: check here