correlation | 易学教程

Correlation between NA columns

阅读更多关于 Correlation between NA columns

I have to write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate (two columns) from each file where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no files meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows My code looks like this corr <- function(directory,threshold=0){ a<

How to colourise some cell borders in R corrplot?

阅读更多关于 How to colourise some cell borders in R corrplot?

问题 I would like to keep some cells in attention by making their borders clearly distinct from anything else. The parameter rect.col is used to colorise all borders but I want to colorise only borders of the cells (3,3) and (7,7), for instance, by any halo color etc heat.colors(100) or rainbow(12) . Code: library("corrplot") library("psych") ids <- seq(1,11) M.cor <- cor(mtcars) colnames(M.cor) <- ids rownames(M.cor) <- ids p.mat <- psych::corr.test(M.cor, adjust = "none", ci = F) p.mat <- p.mat[

AttributeError: 'NoneType' object has no attribute 'setCallSite'

阅读更多关于 AttributeError: 'NoneType' object has no attribute 'setCallSite'

In PySpark, I want to calculate the correlation between two dataframe vectors, using the following code (I do not have any problem in importing pyspark or createDataFrame): from pyspark.ml.linalg import Vectors from pyspark.ml.stat import Correlation import pyspark spark = pyspark.sql.SparkSession.builder.master("local[*]").getOrCreate() data = [(Vectors.sparse(4, [(0, 1.0), (3, -2.0)]),), (Vectors.dense([4.0, 5.0, 0.0, 3.0]),)] df = spark.createDataFrame(data, ["features"]) r1 = Correlation.corr(df, "features").head() print("Pearson correlation matrix:\n" + str(r1[0])) But, I got the

Getting the correlation with significance of one variable with the rest of the dataset, by time, in data.table

阅读更多关于 Getting the correlation with significance of one variable with the rest of the dataset, by time, in data.table

问题 I stole this example from the following post: LINK set.seed(1) TDT <- data.table(Group = c(rep("A",40),rep("B",60)), Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)), Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5), norm = round(runif(100)/10,2), x1 = sample(100,100), x2 = round(rnorm(100,0.75,0.3),2), x3 = round(rnorm(100,0.75,0.3),2), x4 = round(rnorm(100,0.75,0.3),2), x5 = round(rnorm(100,0.75,0.3),2)) In order to get the correlations of x1 - x5 by time, one

Row-wise correlations in R

阅读更多关于 Row-wise correlations in R

问题 I have two matrices of the same size. I would like to calculate the correlation coefficient between each pair of rows in these matrices; row 1 from A with row 1 B, row 2 from A with row 2 from B etc. A <- matrix(runif(1:200), nrow=20) B <- matrix(runif(1:200), nrow=20) Best I could come up with is ret <- sapply(1:20, function(i) cor(A[i,], B[i,])) but it is terribly inefficient (the matrices have tens of thousands of rows). Is there a better, faster way? 回答1: This should be fast: cA <- A -

How to colourise some cell borders in R corrplot?

阅读更多关于 How to colourise some cell borders in R corrplot?

I would like to keep some cells in attention by making their borders clearly distinct from anything else. The parameter rect.col is used to colorise all borders but I want to colorise only borders of the cells (3,3) and (7,7), for instance, by any halo color etc heat.colors(100) or rainbow(12) . Code: library("corrplot") library("psych") ids <- seq(1,11) M.cor <- cor(mtcars) colnames(M.cor) <- ids rownames(M.cor) <- ids p.mat <- psych::corr.test(M.cor, adjust = "none", ci = F) p.mat <- p.mat[["r"]] corrplot(M.cor, method = "color", type = "upper", tl.col = 'black', diag = TRUE, p.mat = p.mat,

Row-wise correlations in R

阅读更多关于 Row-wise correlations in R

I have two matrices of the same size. I would like to calculate the correlation coefficient between each pair of rows in these matrices; row 1 from A with row 1 B, row 2 from A with row 2 from B etc. A <- matrix(runif(1:200), nrow=20) B <- matrix(runif(1:200), nrow=20) Best I could come up with is ret <- sapply(1:20, function(i) cor(A[i,], B[i,])) but it is terribly inefficient (the matrices have tens of thousands of rows). Is there a better, faster way? This should be fast: cA <- A - rowMeans(A) cB <- B - rowMeans(B) sA <- sqrt(rowMeans(cA^2)) sB <- sqrt(rowMeans(cB^2)) rowMeans(cA * cB) /

How to generate correlated Uniform[0,1] variables

阅读更多关于 How to generate correlated Uniform[0,1] variables

(This question is related to how to generate a dataset of correlated variables with different distributions? ) In Stata, say that I create a random variable following a Uniform[0,1] distribution: set seed 100 gen random1 = runiform() I now want to create a second random variable that is correlated with the first (the correlation should be .75, say), but is bounded by 0 and 1. I would like this second variable to also be more-or-less Uniform[0,1]. How can I do this? This won't be exact, but the NORTA/copula method should be pretty close and easy to implement. The relevant citation is: Cario,

How to match MQ Server reply messages to the correct request

阅读更多关于 How to match MQ Server reply messages to the correct request

I'm connecting to an IBM Websphere MQ. I want to be able to match the reply message with the correct request message. I've trawled through hundreds of pages to get this and have had no luck. I have a class - MQHandler - which sends a message to one defined queue, and reads the request from another. This works fine, however, if multiple users are using the application at the same time, messages get mixed up. I can't seem to get a method on the receiver to indicate the CorrelationID to match. Something like... consumer.receive( selector ); Can you check the below methods to ensure I'm doing this

How to produce a meaningful draftsman/correlation plot for discrete values

阅读更多关于 How to produce a meaningful draftsman/correlation plot for discrete values

One of my favorite tools for exploratory analysis is pairs() , however in the case of a limited number of discrete values, it falls flat as the dots all align perfectly. Consider the following: y <- t(rmultinom(n=1000,size=4,prob=rep(.25,4))) pairs(y) It doesn't really give a good sense of correlation. Is there an alternative plot style that would? If you change y to a data.frame you can add some 'jitter' and with the col option you can set the transparency level (the 4th number in rgb): y <- data.frame(y) pairs(sapply(y,jitter), col = rgb(0,0,0,.2)) Or you could use ggplot2's plotmatrix: