correlation

Generate correlated data in Python (3.3)

岁酱吖の 提交于 2019-11-30 02:28:53
In R there is a function ( cm.rnorm.cor , from package CreditMetrics ), that takes the amount of samples, the amount of variables, and a correlation matrix in order to create correlated data. Is there an equivalent in Python? numpy.random.multivariate_normal is the function that you want. Example: import numpy as np import matplotlib.pyplot as plt num_samples = 400 # The desired mean values of the sample. mu = np.array([5.0, 0.0, 10.0]) # The desired covariance matrix. r = np.array([ [ 3.40, -2.75, -2.00], [ -2.75, 5.50, 1.50], [ -2.00, 1.50, 1.25] ]) # Generate the random samples. y = np

Dealing with missing values for correlations calculation

眉间皱痕 提交于 2019-11-30 01:29:27
I have huge matrix with a lot of missing values. I want to get the correlation between variables. 1. Is the solution cor(na.omit(matrix)) better than below? cor(matrix, use = "pairwise.complete.obs") I already have selected only variables having more than 20% of missing values. 2. Which is the best method to make sense ? I would vote for the second option. Sounds like you have a fair amount of missing data and so you would be looking for a sensible multiple imputation strategy to fill in the spaces. See Harrell's text "Regression Modeling Strategies" for a wealth of guidance on 'how's to do

Significance level added to matrix correlation heatmap using ggplot2

喜夏-厌秋 提交于 2019-11-29 19:45:32
I wonder how one can add another layer of important and needed complexity to a matrix correlation heatmap like for example the p value after the manner of the significance level stars in addition to the R2 value (-1 to 1)? It was NOT INTENDED in this question to put significance level stars OR the p values as text on each square of the matrix BUT rather to show this in a graphical out-of-the-box representation of significance level on each square of the matrix. I think only those who enjoy the blessing of INNOVATIVE thinking can win the applause to unravel this kind of solution in order to

Weighted correlation coefficient with pandas

孤街醉人 提交于 2019-11-29 18:40:01
问题 Is there any way to compute weighted correlation coefficient with pandas? I saw that R has such a method. Also, I'd like to get the p value of the correlation. This I did not find also in R. Link to Wikipedia for explanation about weighted correlation: https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Weighted_correlation_coefficient 回答1: I don't know of any Python packages that implement this, but it should be fairly straightforward to roll your own implementation.

Correlation between groups in R data.table

吃可爱长大的小学妹 提交于 2019-11-29 18:17:25
问题 Is there a way of elegantly calculating the correlations between values if those values are stored by group in a single column of a data.table (other than converting the data.table to a matrix)? library(data.table) set.seed(1) # reproducibility dt <- data.table(id=1:4, group=rep(letters[1:2], c(4,4)), value=rnorm(8)) setkey(dt, group) # id group value # 1: 1 a -0.6264538 # 2: 2 a 0.1836433 # 3: 3 a -0.8356286 # 4: 4 a 1.5952808 # 5: 1 b 0.3295078 # 6: 2 b -0.8204684 # 7: 3 b 0.4874291 # 8: 4

Polychoric correlation matrix with significance in R

£可爱£侵袭症+ 提交于 2019-11-29 15:28:26
问题 I have been desperately looking for a way to compute a polychoric correlation matrix, with significance in R. If that is very hard then polychoric correlation between two variables with significance would be sufficient. What I have tried so far: library(polychor) poly <- polychor(var1,var2) poly <- polychor(DatM) #where DatM is a DF converted to matrix library(polycor) hetcor(Dat2) #I am however uncertain hetcor is something I would want if I am looking for polychoric correlation. library

How to iterate through parameters to analyse

别来无恙 提交于 2019-11-29 12:44:10
Is there a better way to iterate through a set of parameters of a given dataset? Obviously, I try to get a table of correlation coefficients: columns are "CI, CVP, mean PAP, mean SAP", rows are "ALAT, ASAT, GGT, Bili, LDH, FBG". For each combination I´d like to get the correlation coefficient and the significance level (p=...). Below You see "the hard way". But is there a more elegant way, possibly with a printable table? attach(Liver) cor.test(CI, ALAT, method = "spearman") cor.test(CI, ASAT, method = "spearman") cor.test(CI, GGT, method = "spearman") cor.test(CI, Bili, method = "spearman")

Correlation/p values of all combinations of all rows of two matrices

南笙酒味 提交于 2019-11-29 12:05:12
I would like to calculate the correlation and the p value of that correlatio of each species (bac) to each of the factors (fac) in a second data frame. Both were measured at the same number of stations, but the number of bac and fac don't match. bac1 <- c(1,2,3,4,5) bac2 <- c(2,3,4,5,1) bac3 <- c(4,5,1,2,3) bac4 <- c(5,1,2,3,4) bac <- as.data.frame(cbind(bac1, bac2, bac3, bac4 )) colnames(bac) <- c("station1", "station2", "station3", "station4") rownames(bac) <- c("bac1", "bac2", "bac3", "bac4", "bac5") fac1 <- c(1,2,3,4,5,6) fac2 <- c(2,3,4,5,1,6) fac3<- c(3,4,5,1,2,6) fac4<- c(4,5,1,2,3, 6)

Is there any numpy autocorrellation function with standardized output?

爷,独闯天下 提交于 2019-11-29 09:42:27
问题 I followed the advice of defining the autocorrelation function in another post: def autocorr(x): result = np.correlate(x, x, mode = 'full') maxcorr = np.argmax(result) #print 'maximum = ', result[maxcorr] result = result / result[maxcorr] # <=== normalization return result[result.size/2:] however the maximum value was not "1.0". therefore I introduced the line tagged with "<=== normalization" I tried the function with the dataset of "Time series analysis" (Box - Jenkins) chapter 2. I expected

Correlation coefficients and p values for all pairs of rows of a matrix

坚强是说给别人听的谎言 提交于 2019-11-29 06:21:29
问题 I have a matrix data with m rows and n columns. I used to compute the correlation coefficients between all pairs of rows using np.corrcoef: import numpy as np data = np.array([[0, 1, -1], [0, -1, 1]]) np.corrcoef(data) Now I would also like to have a look at the p-values of these coefficients. np.corrcoef doesn't provide these; scipy.stats.pearsonr does. However, scipy.stats.pearsonr does not accept a matrix on input. Is there a quick way how to compute both the coefficient and the p-value