correlation

How to loop subset of lists in R?

你离开我真会死。 提交于 2019-12-08 05:48:31
问题 I have a list of 9 lists, see the following code where I want to loop only three lists p , r and t for Pearson, Spearson and Kendall correlations, respectively, instead of all 9 lists. The current pseudocode is the following where the test function is corrplot(M.cor, ...) , see below the complete pseudocode for (i in p.mat.all) { ... } Code with mtcars test data library("psych") library("corrplot") M <- mtcars M.cor <- cor(M) p.mat.all <- psych::corr.test(M.cor, method = c("pearson", "kendall

Power Spectrum and Autocorrelation of Data in Numpy

那年仲夏 提交于 2019-12-08 03:58:15
问题 I am interested in computing the power spectrum of a system of particles (~100,000) in 3D space with Python. What I have found so far is a group of functions in Numpy ( fft , fftn ,..) which compute the discrete Fourier transform, of which the square of the absolute value is the power spectrum. My question is a matter of how my data are being represented - and truthfully may be fairly simple to answer. The data structure I have is an array which has a shape of ( n ,2), n being the number of

determining “how good” a correlation is in matlab?

☆樱花仙子☆ 提交于 2019-12-08 03:27:50
问题 I'm working with a set of data and I've obtained a certain correlations (using pearson's correlation coefficient). I've been asked to determine the "quality of the correlation," and by that my supervisor means he wants to see what the correlations would be if I tried permuting all the y values of my ordered pairs, and compared the obtained correlation coefficients. Does anyone know a nice way of doing this? Is there a matlab function that would determine how good a correlation is when

Remove strongly correlated columns from DataFrame

六月ゝ 毕业季﹏ 提交于 2019-12-08 00:23:30
问题 I have a DataFrame like this dict_ = {'Date':['2018-01-01','2018-01-02','2018-01-03','2018-01-04','2018-01-05'],'Col1':[1,2,3,4,5],'Col2':[1.1,1.2,1.3,1.4,1.5],'Col3':[0.33,0.98,1.54,0.01,0.99]} df = pd.DataFrame(dict_, columns=dict_.keys()) I then calculate the pearson correlation between the columns and filter out columns that are correlated above my threshold of 0.95 def trimm_correlated(df_in, threshold): df_corr = df_in.corr(method='pearson', min_periods=1) df_not_correlated = ~(df_corr

generating correlated numbers in numpy / pandas

泄露秘密 提交于 2019-12-07 23:43:29
问题 I’m trying to generate simulated student grades in 4 subjects, where a student record is a single row of data. The code shown here will generate normally distributed random numbers with a mean of 60 and a standard deviation of 15. df = pd.DataFrame(15 * np.random.randn(5, 4) + 60, columns=['Math', 'Science', 'History', 'Art']) What I can’t figure out is how to make it so that a student’s Science mark is highly correlated to their Math mark, and that their History and Art marks are less so,

Can I break down a large-scale correlation matrix?

点点圈 提交于 2019-12-07 21:50:53
问题 the correlation matrix is so large (50000by50000) that it is not efficient in calculating what I want. What I want to do is to break it down to groups and treat each as separate correlation matrices. However, how do I deal with the dependence between those smaller correlation matrices? I have been researching online all day but nothing comes up. There should be some algorithm out there that is related to the approximation of large correlation matrices like this, right? 回答1: Even a 4 x 4

Time series - correlation and lag time

瘦欲@ 提交于 2019-12-07 17:25:17
问题 I am studying the correlation between a set of input variables and a response variable, price. These are all in time series. 1) Is it necessary that I smooth out the curve where the input variable is cyclical (autoregressive)? If so, how? 2) Once a correlation is established, I would like to quantify exactly how the input variable affects the response variable. Eg: "Once X increases >10% then there is an 2% increase in y 6 months later." Which python libraries should I be looking at to

Bayesian Correlation with PyMC3

断了今生、忘了曾经 提交于 2019-12-07 14:05:10
问题 I'm trying to convert this example of Bayesian correlation for PyMC2 to PyMC3, but get completely different results. Most importantly, the mean of the multivariate Normal distribution quickly goes to zero, whereas it should be around 400 (as it is for PyMC2). Consequently, the estimated correlation quickly goes towards 1, which is wrong as well. The full code is available in this notebook for PyMC2 and in this notebook for PyMC3. The relevant code for PyMC2 is def analyze(data): # priors

Pairwise correlation of Pandas DataFrame columns with custom function

南笙酒味 提交于 2019-12-07 06:02:37
问题 Pandas pairwise correlation on a DataFrame comes handy in many cases. However, in my specific case I would like to use a method not provided by Pandas (something other than (pearson, kendall or spearman) to correlate two columns. Is it possible to explicitly define the correlation function to use in this case? The syntax I would like looks like this: def my_method(x,y): return something frame.corr(method=my_method) 回答1: You would need to do this in cython for any kind of perf (with a

Python Scipy spearman correlation for matrix does not match two-array correlation nor pandas.Data.Frame.corr()

£可爱£侵袭症+ 提交于 2019-12-07 05:49:00
问题 I was computing spearman correlations for matrix. I found the matrix input and two-array input gave different results when using scipy.stats.spearmanr . The results are also different from pandas.Data.Frame.corr . from scipy.stats import spearmanr # scipy 1.0.1 import pandas as pd # 0.22.0 import numpy as np #Data X = pd.DataFrame({"A":[-0.4,1,12,78,84,26,0,0], "B":[-0.4,3.3,54,87,25,np.nan,0,1.2], "C":[np.nan,56,78,0,np.nan,143,11,np.nan], "D":[0,-9.3,23,72,np.nan,-2,-0.3,-0.4], "E":[78,np