correlation

Finding proportional columns in matrix

最后都变了- 提交于 2019-12-20 01:07:49
问题 I have a big matrix (1,000 rows and 50,000 columns). I know some columns are correlated (the rank is only 100) and I suspect some columns are even proportional. How can I find such proportional columns? (one way would be looping corr(M(:,j),M(:,k)) ), but is there anything more efficient? 回答1: If you normalize each column by dividing by its maximum, proportionality becomes equality. This makes the problem easier. Now, to test for equality you can use a single (outer) loop over columns; the

R: Efficiently locating time series segments with maximal cross-correlation to input segment?

六眼飞鱼酱① 提交于 2019-12-19 05:53:14
问题 I have a long numerical time series data of approximately 200,000 rows (lets call it Z ). In a loop, I subset x (about 30) consecutive rows from Z at a time and treat them as the query point q . I want to locate within Z the y (~300) most correlated time series segments of length x (most correlated with q ). What is an efficient way to accomplish this? 回答1: The code below finds the 300 segments you are looking for and runs in 8 seconds on my none too powerful Windows laptop, so it should be

How to move larger values close to matrix diagonal in a correlation matrix

自古美人都是妖i 提交于 2019-12-19 03:43:38
问题 I have a correlation matrix X of five elements(C1,C2,C3,C4,C5) C1 C2 C3 C4 C5 C1 * 1 0 1 0 C2 1 * 0 0 1 C3 0 0 * 1 1 C4 1 0 1 * 0 C5 0 1 1 0 * I want to use MatLab to move as many as non-zero cells close to diagonal, while keep the diagonal cells are "*". For example, you may notice that the columns and rows is shifting in the following matrix, while the diagonal cells are "*". C1 C4 C2 C5 C3 C1 * 1 1 0 0 C4 1 * 0 0 1 C2 1 0 * 1 0 C5 0 0 1 * 1 C3 0 1 0 1 * Because I want to do clustering, so

Generating random correlated x and y points using Numpy

扶醉桌前 提交于 2019-12-18 11:53:52
问题 I'd like to generate correlated arrays of x and y coordinates, in order to test various matplotlib plotting approaches, but I'm failing somewhere, because I can't get numpy.random.multivariate_normal to give me the samples I want. Ideally, I want my x values between -0.51, and 51.2, and my y values between 0.33 and 51.6 (though I suppose equal ranges would be OK, since I can constrain the plot afterwards), but I'm not sure what mean (0, 0?) and covariance values I should be using to get these

Adding line of identity to correlation plots using pairs() command in R

╄→尐↘猪︶ㄣ 提交于 2019-12-18 09:04:59
问题 Similar to a prevous post, I'd like to modify the following code (from example in the R documentation for pairs() command): ## put (absolute) correlations on the upper panels, ## with size proportional to the correlations. panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r <- abs(cor(x, y)) txt <- format(c(r, 0.123456789), digits = digits)[1] txt <- paste0(prefix, txt) if(missing(cex.cor)) cex.cor <- 0.8

How to iterate through parameters to analyse

ぃ、小莉子 提交于 2019-12-18 07:09:06
问题 Is there a better way to iterate through a set of parameters of a given dataset? Obviously, I try to get a table of correlation coefficients: columns are "CI, CVP, mean PAP, mean SAP", rows are "ALAT, ASAT, GGT, Bili, LDH, FBG". For each combination I´d like to get the correlation coefficient and the significance level (p=...). Below You see "the hard way". But is there a more elegant way, possibly with a printable table? attach(Liver) cor.test(CI, ALAT, method = "spearman") cor.test(CI, ASAT

Correlation/p values of all combinations of all rows of two matrices

喜欢而已 提交于 2019-12-18 07:06:18
问题 I would like to calculate the correlation and the p value of that correlatio of each species (bac) to each of the factors (fac) in a second data frame. Both were measured at the same number of stations, but the number of bac and fac don't match. bac1 <- c(1,2,3,4,5) bac2 <- c(2,3,4,5,1) bac3 <- c(4,5,1,2,3) bac4 <- c(5,1,2,3,4) bac <- as.data.frame(cbind(bac1, bac2, bac3, bac4 )) colnames(bac) <- c("station1", "station2", "station3", "station4") rownames(bac) <- c("bac1", "bac2", "bac3",

Pandas Correlation Groupby

只谈情不闲聊 提交于 2019-12-17 22:36:49
问题 Assuming I have a dataframe similar to the below, how would I get the correlation between 2 specific columns and then group by the 'ID' column? I believe the Pandas 'corr' method finds the correlation between all columns. If possible I would also like to know how I could find the 'groupby' correlation using the .agg function (i.e. np.correlate). What I have: ID Val1 Val2 OtherData OtherData A 5 4 x x A 4 5 x x A 6 6 x x B 4 1 x x B 8 2 x x B 7 9 x x C 4 8 x x C 5 5 x x C 2 1 x x What I need:

p-values of correlation coefficients

≡放荡痞女 提交于 2019-12-17 20:27:09
问题 I am using R and have a question on correlations. A<-data.frame(A1=c(1,2,3,4,5),B1=c(6,7,8,9,10),C1=c(11,12,13,14,15 )) B<-data.frame(A2=c(6,7,7,10,11),B2=c(2,1,3,8,11),C2=c(1,5,16,7,8)) cor(A,B) # A2 B2 C2 # A1 0.9481224 0.9190183 0.459588 # B1 0.9481224 0.9190183 0.459588 # C1 0.9481224 0.9190183 0.459588 I wanted to obtain the p-value for each of the correlation coefficients in the matrix. Is this possible? I tried using rcorr function from Hmisc package but obtain only a single p-value

Correlation Corrplot Configuration

别说谁变了你拦得住时间么 提交于 2019-12-17 18:42:59
问题 I am newbie in R scripts :-) I need build a correlation matrix and I´am trying to configurate some parameters to adapt the graph. I am using the corrplot package. I Built a corrplot matrix this way: corrplot(cor(d1[,2:14], d1[,2:14]), method=c("color"), bg = "white", addgrid.col = "gray50", tl.cex=1, type="lower", tl.col = "black", col = colorRampPalette(c("red","white","blue"))(100)) I need show the values of correlation in the lower matrix inside the color matrix that I built. How i can do