correlation

How to produce a meaningful draftsman/correlation plot for discrete values

和自甴很熟 提交于 2019-12-01 17:47:33
问题 One of my favorite tools for exploratory analysis is pairs() , however in the case of a limited number of discrete values, it falls flat as the dots all align perfectly. Consider the following: y <- t(rmultinom(n=1000,size=4,prob=rep(.25,4))) pairs(y) It doesn't really give a good sense of correlation. Is there an alternative plot style that would? 回答1: If you change y to a data.frame you can add some 'jitter' and with the col option you can set the transparency level (the 4th number in rgb):

R cor returns NaN sometimes

我只是一个虾纸丫 提交于 2019-12-01 17:38:47
I've been working on some data, available here: Dropbox' csv file (please be kind to use it to replicate the error). When I run the code: t<-read.csv("120.csv") x<-NULL for (i in 1:100){ x<-c(x,cor(t$nitrate,t$sulfate,use="na.or.complete")) } sum(is.nan(x)) I get random values of the last expression, usually around 55 to 60. I expect cor to give repetible results, so I expect x to be a vector of length=100 made of identical values. See, for example, the output of two independent runs: > x<-NULL; for (i in 1:100){x<-c(x,cor(t$nitrate,t$sulfate,use="na.or.complete"))} > sum(is.nan(x)) [1] 62 >

python - how to compute correlation-matrix with nans in data-matrix

青春壹個敷衍的年華 提交于 2019-12-01 16:51:53
I coundn't find a function that computes a matrix of correlation coefficients for arrays containing observations for more than two variables when there are NaNs in the data. There are functions doing this for pairs of variables (or just masking the arrays using ~is.nan()). But using these functions by looping over a large number of variables, computing the correlation for each pair can be very time consuming. So I tried on my own and soon realized that the complexity of doing this is a question of the proper normalization of the Covariance. I would be very interest in your opinions on how to

python - how to compute correlation-matrix with nans in data-matrix

白昼怎懂夜的黑 提交于 2019-12-01 15:15:08
问题 I coundn't find a function that computes a matrix of correlation coefficients for arrays containing observations for more than two variables when there are NaNs in the data. There are functions doing this for pairs of variables (or just masking the arrays using ~is.nan()). But using these functions by looping over a large number of variables, computing the correlation for each pair can be very time consuming. So I tried on my own and soon realized that the complexity of doing this is a

How to find correlation of an image?

坚强是说给别人听的谎言 提交于 2019-12-01 14:14:17
There is an image A of fixed size 256*256*3 (RGB). The mathematical formula for covariance between two adjacent pixels values x,y in an image is popularly known to be: cov(x,y) = 1/n summation from i = 1 to n of [E(x_i-E(x))(y_i-E(y))] r_xy = cov(x,y) / (sqrt(D(x)*D(y))) D(x) = 1/n summation from i = 1 to n of square[(x_i - E(x))] E(x) = 1/n summation from i = 1 to n of (x_i) where r_xy is the correlation coefficients between two horizontally, vertically, and diagonally adjacent pixels of these two images. Q1: How to do the above computation in MATLAB? Q2: How to randomly select say 5000 pairs

Matlab Cross correlation vs Correlation Coefficient question

泄露秘密 提交于 2019-12-01 11:52:36
问题 I'm writing a program in C++ but using data from matlab involving Cross Correlation. I understand that when I do a correlation on 2 sets of data it gives me a single correlation coefficient number indicating if they are related. But I'm wanting to use Cross Correlation on the data series . When I run Cross Correlation on Matlab it gives me a lot of data and when plotted the plot looks like a triangle... I understand Correlation is supposed to be somewhere between +/- 1 but the data toward the

R cor(), method=“pearson” returns NA, but method=“spearman” returns value. Why?

余生长醉 提交于 2019-12-01 11:33:26
问题 I am using R to run correlations on a very large data matrix with approximate dimension 10,000 x 15,000 (events x samples). This data set contains floating point values ranging from -15:15, NA, NaN, inf, and -inf. To simplify the problem I have chosen to work with two rows of my matrix at a time, call them vector1, vector2. The commands are written below: CorrelationSpearman = cor(vector1,vector2, method="spearman",use="pairwise.complete.obs") CorrelationPearson = cor(vector1,vector2,method=

Constructing correlated variables

浪子不回头ぞ 提交于 2019-12-01 11:00:21
I have a variable with a given distribution (normale in my below example). set.seed(32) var1 = rnorm(100,mean=0,sd=1) I want to create a variable (var2) that is correlated to var1 with a linear correlation coefficient (roughly or exactly) equals to "Corr". The slope of regression between var1 and var2 should (rougly or exactly) equals 1. Corr = 0.3 How can I achieve this? I wanted to do something like this: decorelation = rnorm(100,mean=0,sd=1-Corr) var2 = var1 + decorelation But of course when running: cor(var1,var2) The result is not close to Corr! I did something similar a while ago. I am

Calculate correlation by aggregating columns of data frame

邮差的信 提交于 2019-12-01 10:45:12
I have the following data frame: y <- data.frame(group = letters[1:5], a = rnorm(5) , b = rnorm(5), c = rnorm(5), d = rnorm(5) ) How to get a data frame which gives me the correlation between columns a,b and c,d for each row? something like: sapply(y, function(x) {cor(x[2:3],x[4:5])}) Thank you, S You could use apply > apply(y[,-1],1,function(x) cor(x[1:2],x[3:4])) [1] -1 -1 1 -1 1 Or ddply (although this might be overkill, and if two rows have the same group it will do the correlation of columns a&b and c&d for both those rows): > ddply(y,.(group),function(x) cor(c(x$a,x$b),c(x$c,x$d))) group

R: Calculating Pearson correlation and R-squared by group

白昼怎懂夜的黑 提交于 2019-12-01 08:10:54
问题 I am trying to extend the answer of a question R: filtering data and calculating correlation. To obtain the correlation of temperature and humidity for each month of the year (1 = January), we would have to do the same for each month (12 times). cor(airquality[airquality$Month == 1, c("Temp", "Humidity")]) Is there any way to do each month automatically? In my case I have more than 30 groups (not months but species) to which I would like to test for correlations, I just wanted to know if