correlation

Correlation coefficients for sparse matrix in python?

ⅰ亾dé卋堺 提交于 2019-12-03 11:36:23
Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like numpy.corrcoef that will work on a scipy sparse matrix. You can compute the correlation coefficients fairly straightforwardly from the covariance matrix like this: import numpy as np from scipy import sparse def sparse_corrcoef(A, B=None): if B is not None: A = sparse.vstack((A, B), format='csr') A = A.astype(np.float64) n = A.shape[1] # Compute the covariance matrix rowsum = A.sum(1) centering = rowsum.dot(rowsum.T.conjugate()) / n C = (A.dot(A.T.conjugate

How to correlate an Ordinal Categorical column in pandas?

时间秒杀一切 提交于 2019-12-03 11:31:18
问题 I have a DataFrame df with a non-numerical column CatColumn . A B CatColumn 0 381.1396 7.343921 Medium 1 481.3268 6.786945 Medium 2 263.3766 7.628746 High 3 177.2400 5.225647 Medium-High I want to include CatColumn in the correlation analysis with other columns in the Dataframe. I tried DataFrame.corr but it does not include columns with nominal values in the correlation analysis. 回答1: I am going to strongly disagree with the other comments. They miss the main point of correlation: How much

How is using im2col operation in convolutional nets more efficient?

一曲冷凌霜 提交于 2019-12-03 07:51:38
I am trying to implement a convolutional neural netwrok and I don't understand why using im2col operation is more efficient. It basically stores the input to be multiplied by filter in separate columns. But why shouldn't loops be used directly to calculate convolution instead of first performing im2col ? Well, you are thinking in the right way, In Alex Net almost 95% of the GPU time and 89% on CPU time is spent on the Convolutional Layer and Fully Connected Layer. The Convolutional Layer and Fully Connected Layer are implemented using GEMM that stands for General Matrix to Matrix

how to interpret numpy.correlate and numpy.corrcoef values?

帅比萌擦擦* 提交于 2019-12-03 07:34:55
问题 I have two 1D arrays and I want to see their inter-relationships. What procedure should I use in numpy? I am using numpy.corrcoef(arrayA, arrayB) and numpy.correlate(arrayA, arrayB) and both are giving some results that I am not able to comprehend or understand. Can somebody please shed light on how to understand and interpret those numerical results (preferably using an example)? Thanks. 回答1: numpy.correlate simply returns the cross-correlation of two vectors. if you need to understand cross

What to use to do multiple correlation?

血红的双手。 提交于 2019-12-03 07:17:16
I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible. edit [clarification]: Considering a situation like the one described here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_MultivariableMethods/ I would like to compute also

Computing Autocorrelation with FFT Using JTransforms Library

烂漫一生 提交于 2019-12-03 06:20:56
I'm trying to calculate autocorrelation of sample windows in a time series using the code below. I'm applying FFT to that window, then computing magnitudes of real and imaginary parts and setting imaginary part to zero, lastly taking inverse transform of it to obtain autocorrelation: DoubleFFT_1D fft = new DoubleFFT_1D(magCnt); fft.realForward(magFFT); magFFT[0] = (magFFT[0] * magFFT[0]); for (int i = 1; i < (magCnt - (magCnt%2)) / 2; i++) { magFFT[2*i] = magFFT[2*i] * magFFT[2*i] + magFFT[2*i + 1] * magFFT[2*i + 1]; magFFT[2*i + 1] = 0.0; } if (magCnt % 2 == 0) { magFFT[1] = (magFFT[1] *

Remove outliers from correlation coefficient calculation

◇◆丶佛笑我妖孽 提交于 2019-12-03 04:33:10
问题 Assume we have two numeric vectors x and y . The Pearson correlation coefficient between x and y is given by cor(x, y) How can I automatically consider only a subset of x and y in the calculation (say 90%) as to maximize the correlation coefficient? 回答1: If you really want to do this (remove the largest (absolute) residuals), then we can employ the linear model to estimate the least squares solution and associated residuals and then select the middle n% of the data. Here is an example:

Estimating small time shift between two time series

↘锁芯ラ 提交于 2019-12-03 04:18:02
问题 I have two time series, and i suspect that there is a time shift between them, and i want to estimate this time shift. This question has been asked before in: Find phase difference between two (inharmonic) waves and find time shift between two similar waveforms but in my case, the time shift is smaller than the resolution of the data. for example the data is available at hourly resolution, and the time shift is only few minutes(see image). The cause of this is that the datalogger used to

Fast correlation in R using C and parallelization

泄露秘密 提交于 2019-12-03 03:40:19
My project for today was to write a fast correlation routine in R using the basic skillset I have. I have to find the correlation between almost 400 variables each having almost a million observations (i.e. a matrix of size p=1MM rows & n=400 cols). R's native correlation function takes almost 2 mins for 1MM rows and 200 observations per variable. I have not run for 400 observations per column, but my guess is it will take almost 8 mins. I have less than 30 secs to finish it. Hence, I want to do do things. 1 - write a simple correlation function in C and apply it in blocks parallely (see below

Fast cross correlation method in Python

回眸只為那壹抹淺笑 提交于 2019-12-03 03:07:34
I have been recently trying to find a fast and efficient way to perform cross correlation check between two arrays using Python language. After some reading, I found these two options: The NumPy.correlate() method, which is too slow when it comes to large arrays. The cv.MatchTemplate() method, which seems to be much faster. For obvious reasons, I chose the second option. I tried to execute the following code: import scipy import cv image = cv.fromarray(scipy.float32(scipy.asarray([1,2,2,1])),allowND=True) template = cv.fromarray(scipy.float32(scipy.asarray([2,2])),allowND=True) result = cv