correlation

SQL why is SELECT COUNT(*) , MIN(col), MAX(col) faster then SELECT MIN(col), MAX(col)

♀尐吖头ヾ 提交于 2019-12-17 18:32:06
问题 We're seeing a huge difference between these queries. The slow query SELECT MIN(col) AS Firstdate, MAX(col) AS Lastdate FROM table WHERE status = 'OK' AND fk = 4193 Table 'table'. Scan count 2, logical reads 2458969, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 1966 ms, elapsed time = 1955 ms. The fast query SELECT count(*), MIN(col) AS Firstdate, MAX(col) AS Lastdate FROM table WHERE status =

pandas columns correlation with statistical significance

风格不统一 提交于 2019-12-17 18:15:37
问题 What is the best way, given a pandas dataframe, df, to get the correlation between its columns df.1 and df.2 ? I do not want the output to count rows with NaN , which pandas built-in correlation does. But I also want it to output a pvalue or a standard error, which the built-in does not. SciPy seems to get caught up by the NaNs, though I believe it does report significance. Data example: 1 2 0 2 NaN 1 NaN 1 2 1 2 3 -4 3 4 1.3 1 5 NaN NaN 回答1: Answer provided by @Shashank is nice. However, if

Correlation heatmap

拟墨画扇 提交于 2019-12-17 17:29:10
问题 I want to represent correlation matrix using a heatmap. There is something called correlogram in R, but I don't think there's such a thing in Python. How can I do this? The values go from -1 to 1, for example: [[ 1. 0.00279981 0.95173379 0.02486161 -0.00324926 -0.00432099] [ 0.00279981 1. 0.17728303 0.64425774 0.30735071 0.37379443] [ 0.95173379 0.17728303 1. 0.27072266 0.02549031 0.03324756] [ 0.02486161 0.64425774 0.27072266 1. 0.18336236 0.18913512] [-0.00324926 0.30735071 0.02549031 0

How can I create a correlation matrix in R?

北城以北 提交于 2019-12-17 04:41:26
问题 I have 92 set of data of same type. I want to make a correlation matrix for any two combination possible. i.e. I want a matrix of 92 x92. such that element (ci,cj) should be correlation between ci and cj. How do I do that? 回答1: An example, d &lt- data.frame(x1=rnorm(10), x2=rnorm(10), x3=rnorm(10)) cor(d) # get correlations (returns matrix) 回答2: You could use 'corrplot' package. d <- data.frame(x1=rnorm(10), x2=rnorm(10), x3=rnorm(10)) M <- cor(d) # get correlations library('corrplot')

Create correlated variables following various distributions

一曲冷凌霜 提交于 2019-12-14 03:48:39
问题 Question In R, I would like to create n variables of length L which relationship is given by a correlation matrix called cor_matrix . The important point is that the n variables may follow different distributions (including continuous vs discrete distributions). Related posts how-to-generate-sample-data-with-exact-moments generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable r-constructing-correlated-variables Modified from the third post listed above, the following is

How to generate correlation plot of my data.frame in R?

那年仲夏 提交于 2019-12-13 19:41:04
问题 It might be a simple question. I have a df and I want to generate a correlation plot for my data in R. head(df) x y 1 -0.10967469 1 2 1.06814661 93 3 0.71805993 46 4 0.60566332 84 5 0.73714006 12 6 -0.06029712 5 I've found a package called corPlot and I've generated two plots based on pearson & spearman methods. corPlot(df, method = 'pearson') corPlot(df, method = 'spearman') here is my output with pearson method: I wondered if there is another package to generate the same correlation plots

How can correlate against multiple columns using ddply?

感情迁移 提交于 2019-12-13 12:42:45
问题 I have a data.frame and I want to calculate correlation coefficients using one column against the other columns (there are some non-numeric columns in the frame as well). ddply(Banks,.(brand_id,standard.quarter),function(x) { cor(BLY11,x) }) # Error in cor(BLY11, x) : 'y' must be numeric I tested against is.numeric(x) ddply(Banks,.(brand_id,standard.quarter),function(x) { if is.numeric(x) cor(BLY11,x) else 0 }) but that failed every comparison and returned 0 and returned only one column, as

correlation in matlab

跟風遠走 提交于 2019-12-13 09:11:39
问题 The following script finds the correlation between each pair of data. clear all LName={'Name1','Name2','Name3','Name4','Name5'}; Data={rand(12,1),rand(12,1),rand(12,1),rand(12,1),rand(12,1)}; %place in a structure d = [LName;Data]; Data = struct(d{:}); d1 = cell2mat(struct2cell(Data)'); [R,P] = corrcoef(d1); Correlation = [LName(nchoosek(1:length(R),2)) num2cell(nonzeros(tril(R,-1)))] Furthermore, the script also states in 'Correlation' which combination of data was used in generating the

Correlate a single time series with a large number of time series

孤者浪人 提交于 2019-12-13 02:27:53
问题 I have a large number ( M ) of time series, each with N time points, stored in an MxN matrix. Then I also have a separate time series with N time points that I would like to correlate with all the time series in the matrix. An easy solution is to go through the matrix row by row and run numpy.corrcoef . However, I was wondering if there is a faster or more concise way to do this? 回答1: Let's use this correlation formula : You can implement this for X as the M x N array and Y as the other

Correlation matrix between different files

久未见 提交于 2019-12-13 01:38:41
问题 I have 82 .csv files, each of them a zoo object, with the following format: "Index", "code", "pp" 1951-01-01, 2030, 22.9 1951-01-02, 2030, 0.5 1951-01-03, 2030, 0.0 I want to do a correlation matrix between the pp of all of my files. I found out how to do it "manually" between two files: zz<-merge(x,y, all = FALSE) z<-cbind(zz[,2], zz[,4]) cor(z,use= "complete.obs") but I can't come up with a loop to do it for all the files... a few things to consider: each file starts and ends at different