correlation

Correlation between NA columns

人盡茶涼 提交于 2019-12-02 14:03:56
I have to write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate (two columns) from each file where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no files meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows My code looks like this corr <- function(directory,threshold=0){ a<

How to colourise some cell borders in R corrplot?

给你一囗甜甜゛ 提交于 2019-12-02 11:58:21
问题 I would like to keep some cells in attention by making their borders clearly distinct from anything else. The parameter rect.col is used to colorise all borders but I want to colorise only borders of the cells (3,3) and (7,7), for instance, by any halo color etc heat.colors(100) or rainbow(12) . Code: library("corrplot") library("psych") ids <- seq(1,11) M.cor <- cor(mtcars) colnames(M.cor) <- ids rownames(M.cor) <- ids p.mat <- psych::corr.test(M.cor, adjust = "none", ci = F) p.mat <- p.mat[

AttributeError: 'NoneType' object has no attribute 'setCallSite'

半腔热情 提交于 2019-12-02 08:49:35
In PySpark, I want to calculate the correlation between two dataframe vectors, using the following code (I do not have any problem in importing pyspark or createDataFrame): from pyspark.ml.linalg import Vectors from pyspark.ml.stat import Correlation import pyspark spark = pyspark.sql.SparkSession.builder.master("local[*]").getOrCreate() data = [(Vectors.sparse(4, [(0, 1.0), (3, -2.0)]),), (Vectors.dense([4.0, 5.0, 0.0, 3.0]),)] df = spark.createDataFrame(data, ["features"]) r1 = Correlation.corr(df, "features").head() print("Pearson correlation matrix:\n" + str(r1[0])) But, I got the

Getting the correlation with significance of one variable with the rest of the dataset, by time, in data.table

一曲冷凌霜 提交于 2019-12-02 06:35:49
问题 I stole this example from the following post: LINK set.seed(1) TDT <- data.table(Group = c(rep("A",40),rep("B",60)), Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)), Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5), norm = round(runif(100)/10,2), x1 = sample(100,100), x2 = round(rnorm(100,0.75,0.3),2), x3 = round(rnorm(100,0.75,0.3),2), x4 = round(rnorm(100,0.75,0.3),2), x5 = round(rnorm(100,0.75,0.3),2)) In order to get the correlations of x1 - x5 by time, one

Row-wise correlations in R

青春壹個敷衍的年華 提交于 2019-12-02 04:59:26
问题 I have two matrices of the same size. I would like to calculate the correlation coefficient between each pair of rows in these matrices; row 1 from A with row 1 B, row 2 from A with row 2 from B etc. A <- matrix(runif(1:200), nrow=20) B <- matrix(runif(1:200), nrow=20) Best I could come up with is ret <- sapply(1:20, function(i) cor(A[i,], B[i,])) but it is terribly inefficient (the matrices have tens of thousands of rows). Is there a better, faster way? 回答1: This should be fast: cA <- A -

How to colourise some cell borders in R corrplot?

廉价感情. 提交于 2019-12-02 04:10:38
I would like to keep some cells in attention by making their borders clearly distinct from anything else. The parameter rect.col is used to colorise all borders but I want to colorise only borders of the cells (3,3) and (7,7), for instance, by any halo color etc heat.colors(100) or rainbow(12) . Code: library("corrplot") library("psych") ids <- seq(1,11) M.cor <- cor(mtcars) colnames(M.cor) <- ids rownames(M.cor) <- ids p.mat <- psych::corr.test(M.cor, adjust = "none", ci = F) p.mat <- p.mat[["r"]] corrplot(M.cor, method = "color", type = "upper", tl.col = 'black', diag = TRUE, p.mat = p.mat,

Row-wise correlations in R

六眼飞鱼酱① 提交于 2019-12-02 00:46:39
I have two matrices of the same size. I would like to calculate the correlation coefficient between each pair of rows in these matrices; row 1 from A with row 1 B, row 2 from A with row 2 from B etc. A <- matrix(runif(1:200), nrow=20) B <- matrix(runif(1:200), nrow=20) Best I could come up with is ret <- sapply(1:20, function(i) cor(A[i,], B[i,])) but it is terribly inefficient (the matrices have tens of thousands of rows). Is there a better, faster way? This should be fast: cA <- A - rowMeans(A) cB <- B - rowMeans(B) sA <- sqrt(rowMeans(cA^2)) sB <- sqrt(rowMeans(cB^2)) rowMeans(cA * cB) /

How to generate correlated Uniform[0,1] variables

穿精又带淫゛_ 提交于 2019-12-01 19:41:17
(This question is related to how to generate a dataset of correlated variables with different distributions? ) In Stata, say that I create a random variable following a Uniform[0,1] distribution: set seed 100 gen random1 = runiform() I now want to create a second random variable that is correlated with the first (the correlation should be .75, say), but is bounded by 0 and 1. I would like this second variable to also be more-or-less Uniform[0,1]. How can I do this? This won't be exact, but the NORTA/copula method should be pretty close and easy to implement. The relevant citation is: Cario,

How to match MQ Server reply messages to the correct request

ⅰ亾dé卋堺 提交于 2019-12-01 18:45:07
I'm connecting to an IBM Websphere MQ. I want to be able to match the reply message with the correct request message. I've trawled through hundreds of pages to get this and have had no luck. I have a class - MQHandler - which sends a message to one defined queue, and reads the request from another. This works fine, however, if multiple users are using the application at the same time, messages get mixed up. I can't seem to get a method on the receiver to indicate the CorrelationID to match. Something like... consumer.receive( selector ); Can you check the below methods to ensure I'm doing this

How to produce a meaningful draftsman/correlation plot for discrete values

允我心安 提交于 2019-12-01 18:21:49
One of my favorite tools for exploratory analysis is pairs() , however in the case of a limited number of discrete values, it falls flat as the dots all align perfectly. Consider the following: y <- t(rmultinom(n=1000,size=4,prob=rep(.25,4))) pairs(y) It doesn't really give a good sense of correlation. Is there an alternative plot style that would? If you change y to a data.frame you can add some 'jitter' and with the col option you can set the transparency level (the 4th number in rgb): y <- data.frame(y) pairs(sapply(y,jitter), col = rgb(0,0,0,.2)) Or you could use ggplot2's plotmatrix: