correlation

pandas correlation matrix between each pair groupby item

為{幸葍}努か 提交于 2019-12-03 16:31:34
I have a csv file like this: date,sym,close 2014.01.01,A,10 2014.01.02,A,11 2014.01.03,A,12 2014.01.04,A,13 2014.01.01,B,20 2014.01.02,B,22 2014.01.03,B,23 2014.01.01,C,33 2014.01.02,C,32 2014.01.03,C,31 Then, I get a dateframe named df via read_csv function import numpy as np import pandas as pd df=pd.read_csv('daily.csv',index_col=[0]) groups=df.groupby('sym')[['close']].apply(lambda x:func(x['close'].values)) The groups look like this: sym A [nan,1.00,2.00,...] B [nan,1.00,2.00,...] C [nan,1.00,2.00,...] How to calculate the correlation between each pair of sym? AA,AB,AC,BB,BA,BC,CA,CB,CC

Correlation between multiple variables of a data frame

白昼怎懂夜的黑 提交于 2019-12-03 15:17:08
I have a data.frame of 10 Variables in R . Lets call them var1 var2 ... var10 I want to find correlation of one of var1 with respect to var2 , var3 ... var10 How can we do that? cor function can find correlation between 2 variables at a time. By using that I had to write cor function for each Analysis My package corrr , which helps to explore correlations, has a simple solution for this. I'll use the mtcars data set as an example, and say we want to focus on the correlation of mpg with all other variables. install.packages("corrr") # though keep eye out for new version coming soon library

how to check for correlation among continuous and categorical variables in python?

感情迁移 提交于 2019-12-03 14:47:20
I have a dataset including categorical variables(binary) and continuous variables. I'm trying to apply a linear regression model for predicting a continuous variable. Can someone please let me know how to check for correlation among the categorical variables and the continuous target variable. Current Code: import pandas as pd df_hosp = pd.read_csv('C:\Users\LAPPY-2\Desktop\LengthOfStay.csv') data = df_hosp[['lengthofstay', 'male', 'female', 'dialysisrenalendstage', 'asthma', \ 'irondef', 'pneum', 'substancedependence', \ 'psychologicaldisordermajor', 'depress', 'psychother', \

Correlation matrix plot with ggplot2

萝らか妹 提交于 2019-12-03 13:58:34
I want to create a correlation matrix plot, i.e. a plot where each variable is plotted in a scatterplot against each other variable like with pairs() or splom() . I want to do this with ggplot2. See here for examples . The link mentions some code someone wrote for doing this in ggplot2, however, it is outdated and no longer works (even after you swap out the deprecated parts). One could do this with a loop in a loop and then multiplot() , but there must be a better way. I tried melting the dataset to long, and copying the value and variable variables and then using facets. This almost gives

Fast correlation in R using C and parallelization

烂漫一生 提交于 2019-12-03 13:46:37
问题 My project for today was to write a fast correlation routine in R using the basic skillset I have. I have to find the correlation between almost 400 variables each having almost a million observations (i.e. a matrix of size p=1MM rows & n=400 cols). R's native correlation function takes almost 2 mins for 1MM rows and 200 observations per variable. I have not run for 400 observations per column, but my guess is it will take almost 8 mins. I have less than 30 secs to finish it. Hence, I want to

R getting the minimum value for each row in a matrix, and returning the row and column name

断了今生、忘了曾经 提交于 2019-12-03 13:16:46
I have a matrix like so: Only in reality it is hundreds or thousands of values. What I need to do is return the minimum value for each row, along with the row/col name. So for row 1 in the example, "BAC", the minimum is 0.92 for BAC/CSCO, so I need to return something like: BAC/CSCO 0.92 And then repeat this for each row in the matrix. Assistance is greatly appreciated. I think apply is the trick, but I can't quite get the right combination. X <- matrix(runif(20), nrow=4) rownames(X) <- paste0("foo", seq(nrow(X))) colnames(X) <- paste0("bar", seq(ncol(X))) result <- t(sapply(seq(nrow(X)),

Transform Correlation Matrix into dataframe with records for each row column pair

天大地大妈咪最大 提交于 2019-12-03 13:12:25
I have a large matrix of correlations (1093 x 1093). I'm trying my matrix into a dataframe that has a column for every row and column pair, so it would (1093)^2 records. Here's a snippet of my matrix 60516 45264 02117 60516 1.00000000 -0.370793012 -0.082897941 45264 -0.37079301 1.000000000 0.005145601 02117 -0.08289794 0.005145601 1.000000000 The goal from here would be to have a dataframe that looks like this: row column correlation 60516 60516 1.000000000 60516 45264 -0.370793012 ........ and so on. Anyone have any tips? Let me know if I can clarify anything Thanks, Ben For matrix m , you

Complete.obs of cor() function

情到浓时终转凉″ 提交于 2019-12-03 12:26:36
I am establishing a correlation matrix for my data, which looks like this df <- structure(list(V1 = c(56, 123, 546, 26, 62, 6, NA, NA, NA, 15 ), V2 = c(21, 231, 5, 5, 32, NA, 1, 231, 5, 200), V3 = c(NA, NA, 24, 51, 53, 231, NA, 153, 6, 700), V4 = c(2, 10, NA, 20, 56, 1, 1, 53, 40, 5000)), .Names = c("V1", "V2", "V3", "V4"), row.names = c(NA, 10L), class = "data.frame") This gives the following data frame: V1 V2 V3 V4 1 56 21 NA 2 2 123 231 NA 10 3 546 5 24 NA 4 26 5 51 20 5 62 32 53 56 6 6 NA 231 1 7 NA 1 NA 1 8 NA 231 153 53 9 NA 5 6 40 10 15 200 700 5000 I normally use a complete.obs command

In Python, how can I calculate correlation and statistical significance between two arrays of data?

怎甘沉沦 提交于 2019-12-03 12:24:01
I have sets of data with two equally long arrays of data, or I can make an array of two-item entries, and I would like to calculate the correlation and statistical significance represented by the data (which may be tightly correlated, or may have no statistically significant correlation). I am programming in Python and have scipy and numpy installed. I looked and found Calculating Pearson correlation and significance in Python , but that seems to want the data to be manipulated so it falls into a specified range. What is the proper way to, I assume, ask scipy or numpy to give me the

Create a correlation graph in Matlab

亡梦爱人 提交于 2019-12-03 12:21:30
I'm trying to emulate this graph: If I have a correlation matrix how can I create an output like this? If you have an n x n correlation matrix M , and a vector L of length n containing the label for each bin, you can use something like the following: imagesc(M); % plot the matrix set(gca, 'XTick', 1:n); % center x-axis ticks on bins set(gca, 'YTick', 1:n); % center y-axis ticks on bins set(gca, 'XTickLabel', L); % set x-axis labels set(gca, 'YTickLabel', L); % set y-axis labels title('Your Title Here', 'FontSize', 14); % set title colormap('jet'); % set the colorscheme colorbar on; % enable