correlation | 易学教程

pandas correlation matrix between each pair groupby item

阅读更多关于 pandas correlation matrix between each pair groupby item

问题 I have a csv file like this: date,sym,close 2014.01.01,A,10 2014.01.02,A,11 2014.01.03,A,12 2014.01.04,A,13 2014.01.01,B,20 2014.01.02,B,22 2014.01.03,B,23 2014.01.01,C,33 2014.01.02,C,32 2014.01.03,C,31 Then, I get a dateframe named df via read_csv function import numpy as np import pandas as pd df=pd.read_csv('daily.csv',index_col=[0]) groups=df.groupby('sym')[['close']].apply(lambda x:func(x['close'].values)) The groups look like this: sym A [nan,1.00,2.00,...] B [nan,1.00,2.00,...] C [nan

How to efficiently get the correlation matrix (with p-values) of a data frame with NaN values?

阅读更多关于 How to efficiently get the correlation matrix (with p-values) of a data frame with NaN values?

问题 I am trying to compute a matrix of correlation, and filter the correlations based on the p-values to find out the highly correlated pairs. To explain what I mean, say I have a data frame like this. df A B C D 0 2 NaN 2 -2 1 NaN 1 1 1.1 2 1 NaN NaN 3.2 3 -4 NaN 2 2 4 NaN 1 2.1 NaN 5 NaN 3 1 1 6 3 NaN 0 NaN For the correlation coefficient. I used pd.corr(). This method can process data frame with NaN values, and more importantly, it tolerates pair of columns having 0 overlap (col A and col B):

R getting the minimum value for each row in a matrix, and returning the row and column name

阅读更多关于 R getting the minimum value for each row in a matrix, and returning the row and column name

问题 I have a matrix like so: Only in reality it is hundreds or thousands of values. What I need to do is return the minimum value for each row, along with the row/col name. So for row 1 in the example, "BAC", the minimum is 0.92 for BAC/CSCO, so I need to return something like: BAC/CSCO 0.92 And then repeat this for each row in the matrix. Assistance is greatly appreciated. I think apply is the trick, but I can't quite get the right combination. 回答1: X <- matrix(runif(20), nrow=4) rownames(X) <-

Python - generate array of specific autocorrelation

阅读更多关于 Python - generate array of specific autocorrelation

问题 I am interested in generating an array(or numpy Series) of length N that will exhibit specific autocorrelation at lag 1. Ideally, I want to specify the mean and variance, as well, and have the data drawn from (multi)normal distribution. But most importantly, I want to specify the autocorrelation. How do I do this with numpy, or scikit-learn? Just to be explicit and precise, this is the autocorrelation I want to control: numpy.corrcoef(x[0:len(x) - 1], x[1:])[0][1] 回答1: If you are interested

Complete.obs of cor() function

阅读更多关于 Complete.obs of cor() function

问题 I am establishing a correlation matrix for my data, which looks like this df <- structure(list(V1 = c(56, 123, 546, 26, 62, 6, NA, NA, NA, 15 ), V2 = c(21, 231, 5, 5, 32, NA, 1, 231, 5, 200), V3 = c(NA, NA, 24, 51, 53, 231, NA, 153, 6, 700), V4 = c(2, 10, NA, 20, 56, 1, 1, 53, 40, 5000)), .Names = c("V1", "V2", "V3", "V4"), row.names = c(NA, 10L), class = "data.frame") This gives the following data frame: V1 V2 V3 V4 1 56 21 NA 2 2 123 231 NA 10 3 546 5 24 NA 4 26 5 51 20 5 62 32 53 56 6 6 NA

Normalize scipy.ndimage.filters.correlate

阅读更多关于 Normalize scipy.ndimage.filters.correlate

does anybody have an idea how to normalize the scipy.ndimage.filters.correlate function to get : XCM = 1/N(xc(a-mu_a,b-mu_b)/(sig_a*sig_b)) What is N for the correlation? It usually is the # of datapoints / pixels for images. Which value shall I choose for scipy.ndimage.filters.correlate ? My images differ in size. I guess the scipy correlate function pads the small image into zeros? The size of the final matrix N = XCM.sizeX() * XCM.sizeY() ? Thanks, El It looks to me like you're trying to compute the normalized cross-correlation of two images (I suspect you're probably trying to do template

Calculate special correlation distance matrix faster

阅读更多关于 Calculate special correlation distance matrix faster

I would like to build a distance matrix using Pearson correlation distance. I first tried the scipy.spatial.distance.pdist(df,'correlation') which is very fast for my 5000 rows * 20 features dataset. Since I want to build a recommender, I wanted to slightly change the distance, only considering features which are distinct for NaN for both users. Indeed, scipy.spatial.distance.pdist(df,'correlation') output NaN when it meets any feature whose value is float('nan'). Here is my code, df being my 5000*20 pandas DataFrame dist_mat = [] d = df.shape[1] for i,row_i in enumerate(df.itertuples()): for

Generating two correlated random vectors

阅读更多关于 Generating two correlated random vectors

I want to generate two random vectors with a specified correlation. Each element of the second vector must be correlated with the corresponding element of the first vector and independent of others. How could I do this in MATLAB? By the way the elements of the first vector dont have the same distribution, I mean each element of the first vector should have different variances. (the vector is made of 7 variable with different variances. As described in this Mathworks article , you can do the following: Generate two random vectors ( i.e a random matrix with two columns). Let's say that you want

Correlation among 2 images

阅读更多关于 Correlation among 2 images

I am trying to find the following correlation among two images f1 and f2 where the size of the image is PXP. I have written a for loop program for the same but I think an inbuilt function would be faster for the same. Which function in matlab can help me compute this ? Also if the size of both the images are M X N can someone tell me how this formula will change or if the function will be able to handle it. EDIT: Is there any faster function than xcorr2 that can help me seeing that it takes too much time when I only need to have the value for correlation the unshifted images.... This is the

rcorr() function for correlations

阅读更多关于 rcorr() function for correlations

I´m building a correlation between two different matrices with rcorr() function in R: res <- rcorr(as.matrix(table1), as.matrix(table2),type="pearson") It seems to be working fine, however I want to avoid within table correlations - any suggestion? Consider using R's base cor() for distinct correlations between two sets as Hmisc's rcorr() returns all possible combinations. Notice below the upper right quadrant of rcorr() (which repeats diagonally symmetrical on lower left) is the entire result of cor() (rounded to two decimal points). table1 <- matrix(rnorm(25),5) table2 <- matrix(rnorm(25),5)