correlation

In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

旧时模样 提交于 2019-12-28 19:03:25
问题 I am able to do this easily in Excel, but my dataset has gotten too large. In excel, I would use solver. Column A,B,C,D = random numbers Column E = random number (which I want to maximize the correlation to) Column F = A*x+B*y+C*z+D*j where x,y,z,j are coefficients resulted from solver In a separate cell, I would have correl(E,F) In solver, I would set the objective of correl(C,D) to max, by changing variables x,y and setting certain constraints: 1. A,B,C,D have to be between 0 and 1 2. A+B+C

In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

独自空忆成欢 提交于 2019-12-28 19:00:20
问题 I am able to do this easily in Excel, but my dataset has gotten too large. In excel, I would use solver. Column A,B,C,D = random numbers Column E = random number (which I want to maximize the correlation to) Column F = A*x+B*y+C*z+D*j where x,y,z,j are coefficients resulted from solver In a separate cell, I would have correl(E,F) In solver, I would set the objective of correl(C,D) to max, by changing variables x,y and setting certain constraints: 1. A,B,C,D have to be between 0 and 1 2. A+B+C

Weighted Pearson's Correlation?

会有一股神秘感。 提交于 2019-12-28 13:21:12
问题 I have a 2396x34 double matrix named y wherein each row (2396) represents a separate situation consisting of 34 consecutive time segments. I also have a numeric[34] named x that represents a single situation of 34 consecutive time segments. Currently I am calculating the correlation between each row in y and x like this: crs[,2] <- cor(t(y),x) What I need now is to replace the cor function in the above statement with a weighted correlation. The weight vector xy.wt is 34 elements long so that

Correlation of two arrays in C#

我只是一个虾纸丫 提交于 2019-12-28 02:45:10
问题 Having two arrays of double values, I want to compute correlation coefficient (single double value, just like the CORREL function in MS Excel). Is there some simple one-line solution in C#? I already discovered math lib called Meta Numerics. According to this SO question, it should do the job. Here is docs for Meta Numerics correlation method, which I don't get. Could pls somebody provide me with simple code snippet or example how to use the library? Note: At the end, I was forced to use one

A matrix version of cor.test()

时间秒杀一切 提交于 2019-12-27 20:20:11
问题 Cor.test() takes vectors x and y as arguments, but I have an entire matrix of data that I want to test, pairwise. Cor() takes this matrix as an argument just fine, and I'm hoping to find a way to do the same for cor.test() . The common advice from other folks seems to be to use cor.prob() : https://stat.ethz.ch/pipermail/r-help/2001-November/016201.html But these p-values are not the same as those generated by cor.test() !!! Cor.test() also seems better equipped to handle pairwise deletion (I

A matrix version of cor.test()

倖福魔咒の 提交于 2019-12-27 20:13:09
问题 Cor.test() takes vectors x and y as arguments, but I have an entire matrix of data that I want to test, pairwise. Cor() takes this matrix as an argument just fine, and I'm hoping to find a way to do the same for cor.test() . The common advice from other folks seems to be to use cor.prob() : https://stat.ethz.ch/pipermail/r-help/2001-November/016201.html But these p-values are not the same as those generated by cor.test() !!! Cor.test() also seems better equipped to handle pairwise deletion (I

Group of Highly correlated variables

梦想与她 提交于 2019-12-25 08:57:17
问题 I have a dataframe and I want to find which group of variables share highest correlations. For example: mydata <- structure(list(V1 = c(1L, 2L, 5L, 4L, 366L, 65L, 43L, 456L, 876L, 78L, 687L, 378L, 378L, 34L, 53L, 43L), V2 = c(2L, 2L, 5L, 4L, 366L, 65L, 43L, 456L, 876L, 78L, 687L, 378L, 378L, 34L, 53L, 41L), V3 = c(10L, 20L, 10L, 20L, 10L, 20L, 1L, 0L, 1L, 2010L,20L, 10L, 10L, 10L, 10L, 10L), V4 = c(2L, 10L, 31L, 2L, 2L, 5L, 2L, 5L, 1L, 52L, 1L, 2L, 52L, 6L, 2L, 1L), V5 = c(4L, 10L, 31L, 2L,

NA from correlation function

倾然丶 夕夏残阳落幕 提交于 2019-12-25 07:30:03
问题 Could you please explain me the difference between these two cases? > cor(1:10, rep(10,10)) [1] NA Warning message: In cor(1:10, rep(10, 10)) : the standard deviation is zero > cor(1:10, 1:10) [1] 1 the first one is just a straight line as well as the second I would expect the correlation to be one. What am I not considering? Thanks 回答1: Plot the data and it should be clear. The data set ## y doesn't vary plot(1:10, rep(10,10)) is just a horizontal line. The correlation coefficient undefined

How to find the correlation between two strings in pandas

拟墨画扇 提交于 2019-12-25 04:21:58
问题 I have df of string values Keyword plant cell cat Pandas And I want to find the relationship or correlation between these two string values. I have used pandas corr = df1.corrwith(df2,axis=0) . But this is useful for to find the correlation between the numerical values but I want to see whether the two strings are related by finding the correlation distance. How can I do that? 回答1: There are a few steps here, the first thing you need to do is extract some sort of vector for each word. A good

Calculate Correlations of Pairs of Columns in a Data Frame in R

左心房为你撑大大i 提交于 2019-12-25 01:53:56
问题 I have the following dataframe: set.seed(1) y <- data.frame(a1 = rnorm(5) , b1 = rnorm(5), c1 = rnorm(5), a2 = rnorm(5), b2 = rnorm(5), c2 = rnorm(5)) I would like to obtain the correlations of the pairs of columns: cor(a1,a2), cor(b1,b2), cor(c1,c2) I tried the following but NA's appear as output: apply(y,2,function(x) cor(x[1],x[3])) I would like to get the result equivalent to cor(y[,1],y[,4]) cor(y[,2],y[,5]) cor(y[,3],y[,6]) In my actual data frame, I have many more pairs of columns. Any