correlation | 易学教程

Weighted Pearson's Correlation?

阅读更多关于 Weighted Pearson's Correlation?

I have a 2396x34 double matrix named y wherein each row (2396) represents a separate situation consisting of 34 consecutive time segments. I also have a numeric[34] named x that represents a single situation of 34 consecutive time segments. Currently I am calculating the correlation between each row in y and x like this: crs[,2] <- cor(t(y),x) What I need now is to replace the cor function in the above statement with a weighted correlation. The weight vector xy.wt is 34 elements long so that a different weight can be assigned to each of the 34 consecutive time segments. I found the Weighted

Display Correlation Tables as Descending List

阅读更多关于 Display Correlation Tables as Descending List

When running cor() on a times series with a lot of variables, I get a table back that has a row and column for each variable, showing the correlation between them. How can I view this table as a list from most correlated to least correlated (eliminating all NA results and results that map back to themselves (i.e. the correlation of A to A)). I would also like to count inverse (negative) results as absolute values, but still show them as negative. So the desired output would be something like: A,B,0.98 A,C,0.9 C,R,-0.8 T,Z,0.5 Here's one of many ways I could think to do this. I used the reshape

pandas columns correlation with statistical significance

阅读更多关于 pandas columns correlation with statistical significance

What is the best way, given a pandas dataframe, df, to get the correlation between its columns df.1 and df.2 ? I do not want the output to count rows with NaN , which pandas built-in correlation does. But I also want it to output a pvalue or a standard error, which the built-in does not. SciPy seems to get caught up by the NaNs, though I believe it does report significance. Data example: 1 2 0 2 NaN 1 NaN 1 2 1 2 3 -4 3 4 1.3 1 5 NaN NaN BKay Answer provided by @Shashank is nice. However, if you want a solution in pure pandas , you may like this: import pandas as pd from pandas.io.data import

SQL why is SELECT COUNT(*) , MIN(col), MAX(col) faster then SELECT MIN(col), MAX(col)

阅读更多关于 SQL why is SELECT COUNT(*) , MIN(col), MAX(col) faster then SELECT MIN(col), MAX(col)

We're seeing a huge difference between these queries. The slow query SELECT MIN(col) AS Firstdate, MAX(col) AS Lastdate FROM table WHERE status = 'OK' AND fk = 4193 Table 'table'. Scan count 2, logical reads 2458969, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 1966 ms, elapsed time = 1955 ms. The fast query SELECT count(*), MIN(col) AS Firstdate, MAX(col) AS Lastdate FROM table WHERE status = 'OK' AND fk = 4193 Table 'table'. Scan count 1, logical reads 5803, physical reads 0, read-ahead reads 0

cor shows only NA or 1 for correlations - Why?

阅读更多关于 cor shows only NA or 1 for correlations - Why?

I'm running cor() on a data.frame with all numeric values and I'm getting this as the result: price exprice... price 1 NA exprice NA 1 ... So it's either 1 or NA for each value in the resulting table. Why are the NA s showing up instead of valid correlations? The 1 s are because everything is perfectly correlated with itself, and the NA s are because there are NA s in your variables. You will have to specify how you want R to compute the correlation when there are missing values, because the default is to only compute a coefficient with complete information. You can change this behavior with

Spearman correlation and ties

阅读更多关于 Spearman correlation and ties

I'm computing Spearman's rho on small sets of paired rankings. Spearman is well known for not handling ties properly. For example, taking 2 sets of 8 rankings, even if 6 are ties in one of the two sets, the correlation is still very high: > cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman") Spearman's rank correlation rho S = 19.8439, p-value = 0.0274 sample estimates: rho 0.7637626 Warning message: Cannot compute exact p-values with ties The p-value <.05 seems like a pretty high statistical significance for this data. Is there a ties-corrected version of Spearman in R? What

Correlation heatmap

阅读更多关于 Correlation heatmap

I want to represent correlation matrix using a heatmap. There is something called correlogram in R, but I don't think there's such a thing in Python. How can I do this? The values go from -1 to 1, for example: [[ 1. 0.00279981 0.95173379 0.02486161 -0.00324926 -0.00432099] [ 0.00279981 1. 0.17728303 0.64425774 0.30735071 0.37379443] [ 0.95173379 0.17728303 1. 0.27072266 0.02549031 0.03324756] [ 0.02486161 0.64425774 0.27072266 1. 0.18336236 0.18913512] [-0.00324926 0.30735071 0.02549031 0.18336236 1. 0.77678274] [-0.00432099 0.37379443 0.03324756 0.18913512 0.77678274 1. ]] I was able to

Encountered invalid value when I use pearsonr

阅读更多关于 Encountered invalid value when I use pearsonr

问题 Maybe I made a mistake. If so, I am sorry to ask this. I want to calculate Pearson's correlation coefficent by using scipy's pearsonr function. from scipy.stats.stats import pearsonr X = [4, 4, 4, 4, 4, 4] Y = [4, 5, 5, 4, 4, 4] pearsonr(X, Y) I get an error below RuntimeWarning: invalid value encountered in double_scalars ### The reason why I get an error is E[X] = 4 (Excepted Value of X is 4) I look at the code of pearsonr function in scpy.stats.stats.py. Some part of the pearsonr function

How to compute correlations between all columns in R and detect highly correlated variables

阅读更多关于 How to compute correlations between all columns in R and detect highly correlated variables

问题 I have a big dataset with 100 variables and 3000 observations. I want to detect those variables (columns) which are highly correlated or redundant and so remove the dimensonality in the dataframe. I tried this but it calculates only the correlation between one column and the others; and I always get an error message for(i in 1:ncol(predicteurs)){ correlations <- cor(predicteurs[,i],predicteurs[,2]) names(correlations[which.max(abs(correlations))]) } Warning messages: 1: In cor(predicteurs[, i

How to visualize correlation matrix as a schemaball in Matlab

阅读更多关于 How to visualize correlation matrix as a schemaball in Matlab

I have 42 variables and I have calculated the correlation matrix for them in Matlab. Now I would like to visualize it with a schemaball. Does anyone have any suggestions / experiences how this could be done in Matlab? The following pictures will explain my point better: In the pictures each parabola between variables would mean the strength of correlation between them. The thicker the line is, the more correlation. I prefer the style of picture 1 more than the style in picture 2 where I have used different colors to highlight the strength of correlation. Kinda finished I guess.. code can be