correlation

Transform Correlation Matrix into dataframe with records for each row column pair

倖福魔咒の 提交于 2020-01-01 04:54:08
问题 I have a large matrix of correlations (1093 x 1093). I'm trying my matrix into a dataframe that has a column for every row and column pair, so it would (1093)^2 records. Here's a snippet of my matrix 60516 45264 02117 60516 1.00000000 -0.370793012 -0.082897941 45264 -0.37079301 1.000000000 0.005145601 02117 -0.08289794 0.005145601 1.000000000 The goal from here would be to have a dataframe that looks like this: row column correlation 60516 60516 1.000000000 60516 45264 -0.370793012 ........

R - change size of axis labels for corrplot

好久不见. 提交于 2019-12-31 21:35:13
问题 I am using the following with corrplo t: require("corrplot") ## needs the corrplot package corrplot(cor(lpp_axis1, lpp_axis2), method=c("number"), bg = "grey10", addgrid.col = "gray50", tl.offset = 2, tl.cex=2, tl.col = "black", col = colorRampPalette(c("yellow","green","navyblue"))(100)) This is created with a csv file available here. The graph is fine and I can adjust the cl labels all I want. I've tried adjusting the labels on x and y axis with no impact. I looked at changing mar - yet I

“Correlation matrix” for strings. Similarity of nominal data

只谈情不闲聊 提交于 2019-12-31 04:56:06
问题 Here is my data frame. df store_1 store_2 store_3 store_4 0 banana banana plum banana 1 orange tangerine pear orange 2 apple pear melon apple 3 pear raspberry pineapple plum 4 plum tomato peach tomato I'm looking for the way to count number of co-occurrences in stores (to compare their similarity). 回答1: You can try something like this import itertools as it corr = lambda a,b: len(set(a).intersection(set(b)))/len(a) c = [corr(*x) for x in it.combinations_with_replacement(df.T.values.tolist(),2

Constructing correlated variables

笑着哭i 提交于 2019-12-30 11:16:07
问题 I have a variable with a given distribution (normale in my below example). set.seed(32) var1 = rnorm(100,mean=0,sd=1) I want to create a variable (var2) that is correlated to var1 with a linear correlation coefficient (roughly or exactly) equals to "Corr". The slope of regression between var1 and var2 should (rougly or exactly) equals 1. Corr = 0.3 How can I achieve this? I wanted to do something like this: decorelation = rnorm(100,mean=0,sd=1-Corr) var2 = var1 + decorelation But of course

Pearson correlation in PHP

前提是你 提交于 2019-12-30 10:43:38
问题 I'm trying to implement the calculation of correlation coefficient of people between two sets of data in php. I'm just trying to do the porting python script that can be found at this url http://answers.oreilly.com/topic/1066-how-to-find-similar-users-with-python/ my implementation is the following: class LB_Similarity_PearsonCorrelation implements LB_Similarity_Interface{ public function similarity($user1, $user2){ $sharedItem = array(); $pref1 = array(); $pref2 = array(); $result1 = $user1-

Pearson correlation in PHP

£可爱£侵袭症+ 提交于 2019-12-30 10:42:08
问题 I'm trying to implement the calculation of correlation coefficient of people between two sets of data in php. I'm just trying to do the porting python script that can be found at this url http://answers.oreilly.com/topic/1066-how-to-find-similar-users-with-python/ my implementation is the following: class LB_Similarity_PearsonCorrelation implements LB_Similarity_Interface{ public function similarity($user1, $user2){ $sharedItem = array(); $pref1 = array(); $pref2 = array(); $result1 = $user1-

Dealing with missing values for correlations calculation

走远了吗. 提交于 2019-12-29 19:26:04
问题 I have huge matrix with a lot of missing values. I want to get the correlation between variables. 1. Is the solution cor(na.omit(matrix)) better than below? cor(matrix, use = "pairwise.complete.obs") I already have selected only variables having more than 20% of missing values. 2. Which is the best method to make sense ? 回答1: I would vote for the second option. Sounds like you have a fair amount of missing data and so you would be looking for a sensible multiple imputation strategy to fill in

Dealing with missing values for correlations calculation

天涯浪子 提交于 2019-12-29 19:23:12
问题 I have huge matrix with a lot of missing values. I want to get the correlation between variables. 1. Is the solution cor(na.omit(matrix)) better than below? cor(matrix, use = "pairwise.complete.obs") I already have selected only variables having more than 20% of missing values. 2. Which is the best method to make sense ? 回答1: I would vote for the second option. Sounds like you have a fair amount of missing data and so you would be looking for a sensible multiple imputation strategy to fill in

python - cannot make corr work

断了今生、忘了曾经 提交于 2019-12-29 08:32:38
问题 I'm struggling with getting a simple correlation done. I've tried all that was suggested under similar questions. Here are the relevant parts of the code, the various attempts I've made and their results. import numpy as np import pandas as pd try01 = data[['ESA Index_close_px', 'CCMP Index_close_px' ]].corr(method='pearson') print (try01) Out: Empty DataFrame Columns: [] Index: [] try04 = data['ESA Index_close_px'][5:50].corr(data['CCMP Index_close_px'][5:50]) print (try04) Out: *

Phase correlation

一曲冷凌霜 提交于 2019-12-29 06:29:35
问题 How can rotation angle be determined by phase correlation(using fft) of 2 images? The algorithm given in http://en.wikipedia.org/wiki/Phase_correlation returns linear shift, not angular. It also mentions images have to be converted to log-polar coordinates to compute rotation. How is this conversion achieved in python? And post conversion do the same steps of the algorithm hold? 回答1: Log polar transformation is actually rotation and scale invariant.. Rotation corresponds to shift in y axis