correlation

Weighted Pearson's Correlation?

寵の児 提交于 2019-11-28 08:29:05
I have a 2396x34 double matrix named y wherein each row (2396) represents a separate situation consisting of 34 consecutive time segments. I also have a numeric[34] named x that represents a single situation of 34 consecutive time segments. Currently I am calculating the correlation between each row in y and x like this: crs[,2] <- cor(t(y),x) What I need now is to replace the cor function in the above statement with a weighted correlation. The weight vector xy.wt is 34 elements long so that a different weight can be assigned to each of the 34 consecutive time segments. I found the Weighted

Display Correlation Tables as Descending List

蓝咒 提交于 2019-11-28 08:29:04
When running cor() on a times series with a lot of variables, I get a table back that has a row and column for each variable, showing the correlation between them. How can I view this table as a list from most correlated to least correlated (eliminating all NA results and results that map back to themselves (i.e. the correlation of A to A)). I would also like to count inverse (negative) results as absolute values, but still show them as negative. So the desired output would be something like: A,B,0.98 A,C,0.9 C,R,-0.8 T,Z,0.5 Here's one of many ways I could think to do this. I used the reshape

pandas columns correlation with statistical significance

十年热恋 提交于 2019-11-28 07:36:48
What is the best way, given a pandas dataframe, df, to get the correlation between its columns df.1 and df.2 ? I do not want the output to count rows with NaN , which pandas built-in correlation does. But I also want it to output a pvalue or a standard error, which the built-in does not. SciPy seems to get caught up by the NaNs, though I believe it does report significance. Data example: 1 2 0 2 NaN 1 NaN 1 2 1 2 3 -4 3 4 1.3 1 5 NaN NaN BKay Answer provided by @Shashank is nice. However, if you want a solution in pure pandas , you may like this: import pandas as pd from pandas.io.data import

SQL why is SELECT COUNT(*) , MIN(col), MAX(col) faster then SELECT MIN(col), MAX(col)

耗尽温柔 提交于 2019-11-28 07:36:45
We're seeing a huge difference between these queries. The slow query SELECT MIN(col) AS Firstdate, MAX(col) AS Lastdate FROM table WHERE status = 'OK' AND fk = 4193 Table 'table'. Scan count 2, logical reads 2458969, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 1966 ms, elapsed time = 1955 ms. The fast query SELECT count(*), MIN(col) AS Firstdate, MAX(col) AS Lastdate FROM table WHERE status = 'OK' AND fk = 4193 Table 'table'. Scan count 1, logical reads 5803, physical reads 0, read-ahead reads 0

cor shows only NA or 1 for correlations - Why?

独自空忆成欢 提交于 2019-11-28 06:46:48
I'm running cor() on a data.frame with all numeric values and I'm getting this as the result: price exprice... price 1 NA exprice NA 1 ... So it's either 1 or NA for each value in the resulting table. Why are the NA s showing up instead of valid correlations? The 1 s are because everything is perfectly correlated with itself, and the NA s are because there are NA s in your variables. You will have to specify how you want R to compute the correlation when there are missing values, because the default is to only compute a coefficient with complete information. You can change this behavior with

Spearman correlation and ties

 ̄綄美尐妖づ 提交于 2019-11-28 04:33:08
I'm computing Spearman's rho on small sets of paired rankings. Spearman is well known for not handling ties properly. For example, taking 2 sets of 8 rankings, even if 6 are ties in one of the two sets, the correlation is still very high: > cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman") Spearman's rank correlation rho S = 19.8439, p-value = 0.0274 sample estimates: rho 0.7637626 Warning message: Cannot compute exact p-values with ties The p-value <.05 seems like a pretty high statistical significance for this data. Is there a ties-corrected version of Spearman in R? What

Correlation heatmap

孤人 提交于 2019-11-28 03:41:42
I want to represent correlation matrix using a heatmap. There is something called correlogram in R, but I don't think there's such a thing in Python. How can I do this? The values go from -1 to 1, for example: [[ 1. 0.00279981 0.95173379 0.02486161 -0.00324926 -0.00432099] [ 0.00279981 1. 0.17728303 0.64425774 0.30735071 0.37379443] [ 0.95173379 0.17728303 1. 0.27072266 0.02549031 0.03324756] [ 0.02486161 0.64425774 0.27072266 1. 0.18336236 0.18913512] [-0.00324926 0.30735071 0.02549031 0.18336236 1. 0.77678274] [-0.00432099 0.37379443 0.03324756 0.18913512 0.77678274 1. ]] I was able to

Encountered invalid value when I use pearsonr

≯℡__Kan透↙ 提交于 2019-11-27 22:15:07
问题 Maybe I made a mistake. If so, I am sorry to ask this. I want to calculate Pearson's correlation coefficent by using scipy's pearsonr function. from scipy.stats.stats import pearsonr X = [4, 4, 4, 4, 4, 4] Y = [4, 5, 5, 4, 4, 4] pearsonr(X, Y) I get an error below RuntimeWarning: invalid value encountered in double_scalars ### The reason why I get an error is E[X] = 4 (Excepted Value of X is 4) I look at the code of pearsonr function in scpy.stats.stats.py. Some part of the pearsonr function

How to compute correlations between all columns in R and detect highly correlated variables

泪湿孤枕 提交于 2019-11-27 21:19:56
问题 I have a big dataset with 100 variables and 3000 observations. I want to detect those variables (columns) which are highly correlated or redundant and so remove the dimensonality in the dataframe. I tried this but it calculates only the correlation between one column and the others; and I always get an error message for(i in 1:ncol(predicteurs)){ correlations <- cor(predicteurs[,i],predicteurs[,2]) names(correlations[which.max(abs(correlations))]) } Warning messages: 1: In cor(predicteurs[, i

How to visualize correlation matrix as a schemaball in Matlab

拥有回忆 提交于 2019-11-27 17:08:10
I have 42 variables and I have calculated the correlation matrix for them in Matlab. Now I would like to visualize it with a schemaball. Does anyone have any suggestions / experiences how this could be done in Matlab? The following pictures will explain my point better: In the pictures each parabola between variables would mean the strength of correlation between them. The thicker the line is, the more correlation. I prefer the style of picture 1 more than the style in picture 2 where I have used different colors to highlight the strength of correlation. Kinda finished I guess.. code can be