correlation

use ggpairs to create this plot

懵懂的女人 提交于 2019-11-27 04:07:12
I have some code in a Shiny app that produces the first plot below. As you can see the font size varies with the size of the correlation coefficient. I would like to produce something similar with ggpairs (GGally) or ggplot2. The second image below was produced with the following code: library(GGally) ggpairs(df, upper = list(params = c(size = 10)), lower = list(continuous = "smooth", params = c(method = "loess", fill = "blue")) ) As you can see the size of the correlation font is adjustable using size but when I set a vector of sizes only the first value is used. I would also like to remove

Calculate autocorrelation using FFT in Matlab

时间秒杀一切 提交于 2019-11-27 03:04:19
I've read some explanations of how autocorrelation can be more efficiently calculated using the fft of a signal, multiplying the real part by the complex conjugate (Fourier domain), then using the inverse fft, but I'm having trouble realizing this in Matlab because at a detailed level. Amro Just like you stated, take the fft and multiply pointwise by its complex conjugate, then use the inverse fft (or in the case of cross-correlation of two signals: Corr(x,y) <=> FFT(x)FFT(y)* ) x = rand(100,1); len = length(x); %# autocorrelation nfft = 2^nextpow2(2*len-1); r = ifft( fft(x,nfft) .* conj(fft(x

Display Correlation Tables as Descending List

烈酒焚心 提交于 2019-11-27 02:13:29
问题 When running cor() on a times series with a lot of variables, I get a table back that has a row and column for each variable, showing the correlation between them. How can I view this table as a list from most correlated to least correlated (eliminating all NA results and results that map back to themselves (i.e. the correlation of A to A)). I would also like to count inverse (negative) results as absolute values, but still show them as negative. So the desired output would be something like:

Why NUMPY correlate and corrcoef return different values and how to “normalize” a correlate in “full” mode?

对着背影说爱祢 提交于 2019-11-27 01:50:57
问题 I'm trying to use some Time Series Analysis in Python, using Numpy. I have two somewhat medium-sized series, with 20k values each and I want to check the sliding correlation. The corrcoef gives me as output a Matrix of auto-correlation/correlation coefficients. Nothing useful by itself in my case, as one of the series contains a lag. The correlate function (in mode="full") returns a 40k elements list that DO look like the kind of result I'm aiming for (the peak value is as far from the center

cor shows only NA or 1 for correlations - Why?

风流意气都作罢 提交于 2019-11-27 01:35:36
问题 I'm running cor() on a data.frame with all numeric values and I'm getting this as the result: price exprice... price 1 NA exprice NA 1 ... So it's either 1 or NA for each value in the resulting table. Why are the NA s showing up instead of valid correlations? 回答1: The 1 s are because everything is perfectly correlated with itself, and the NA s are because there are NA s in your variables. You will have to specify how you want R to compute the correlation when there are missing values, because

A matrix version of cor.test()

不问归期 提交于 2019-11-27 00:52:49
Cor.test() takes vectors x and y as arguments, but I have an entire matrix of data that I want to test, pairwise. Cor() takes this matrix as an argument just fine, and I'm hoping to find a way to do the same for cor.test() . The common advice from other folks seems to be to use cor.prob() : https://stat.ethz.ch/pipermail/r-help/2001-November/016201.html But these p-values are not the same as those generated by cor.test() !!! Cor.test() also seems better equipped to handle pairwise deletion (I have quite a bit of missing data in my data set) than cor.prob() . Does anybody have any alternatives

Use .corr to get the correlation between two columns

房东的猫 提交于 2019-11-27 00:01:29
I have the following pandas dataframe Top15 : I create a column that estimates the number of citable documents per person: Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita'] Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst'] I want to know the correlation between the number of citable documents per capita and the energy supply per capita. So I use the .corr() method (Pearson's correlation): data = Top15[['Citable docs per Capita','Energy Supply per Capita']] correlation = data.corr(method='pearson') I want to return a single number, but

How can I create a correlation matrix in R?

牧云@^-^@ 提交于 2019-11-26 19:25:43
I have 92 set of data of same type. I want to make a correlation matrix for any two combination possible. i.e. I want a matrix of 92 x92. such that element (ci,cj) should be correlation between ci and cj. How do I do that? Manuel Ramón An example, d &lt- data.frame(x1=rnorm(10), x2=rnorm(10), x3=rnorm(10)) cor(d) # get correlations (returns matrix) You could use 'corrplot' package. d <- data.frame(x1=rnorm(10), x2=rnorm(10), x3=rnorm(10)) M <- cor(d) # get correlations library('corrplot') #package corrplot corrplot(M, method = "circle") #plot matrix More information here: http://cran.r-project

find time shift between two similar waveforms

回眸只為那壹抹淺笑 提交于 2019-11-26 19:04:20
问题 I have to compare two time-vs-voltage waveforms. Because of the peculiarity of the sources of these waveforms, one of them can be a time shifted version of the other. How can i find whether there is a time shift? and if yes, how much is it. I am doing this in Python and wish to use numpy/scipy libraries. 回答1: scipy provides a correlation function which will work fine for small input and also if you want non-circular correlation meaning that the signal will not wrap around. note that in mode=

Calculate correlation - cor() - for only a subset of columns

不羁岁月 提交于 2019-11-26 15:10:15
问题 I have a dataframe and would like to calculate the correlation (with Spearman, data is categorical and ranked) but only for a subset of columns. I tried with all, but R's cor() function only accepts numerical data (x must be numeric, says the error message), even if Spearman is used. One brute approach is to delete the non-numerical columns from the dataframe. This is not as elegant, for speed I still don't want to calculate correlations between all columns. I hope there is a way to simply