correlation

Create clusters using correlation matrix in Python

一笑奈何 提交于 2020-05-14 20:27:05
问题 all, I have a correlation matrix of 21 industry sectors. Now I want to split these 21 sectors into 4 or 5 groups, with sectors of similar behaviors grouped together. Can experts shed me some lights on how to do this in Python please? Thanks much in advance! 回答1: You might explore the use of Pandas DataFrame.corr and the scipy.cluster Hierarchical Clustering package import pandas as pd import scipy.cluster.hierarchy as spc df = pd.DataFrame(my_data) corr = df.corr().values pdist = spc.distance

LabelEncoder for categorical features?

无人久伴 提交于 2020-05-05 15:36:13
问题 This might be a beginner question but I have seen a lot of people using LabelEncoder() to replace categorical variables with ordinality. A lot of people using this feature by passing multiple columns at a time, however I have some doubt about having wrong ordinality in some of my features and how it will be effecting my model. Here is an example: Input import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder a = pd.DataFrame(['High','Low','Low','Medium']) le =

Can not load the `model1` function of the processR package in a Jupyter Notebook

有些话、适合烂在心里 提交于 2020-04-17 23:28:16
问题 I am very new to the whole R programming and trying to follow this tutorial, where the model1 function is used to find the Andrew F. Hayes correlation between three variables. As indicated in the tutorial I have the packages installed: install.packages("devtools") install.packages("processR") devtools::install_github("markhwhiteii/processr") I have also followed the steps: set.seed(1839) var1 <- rnorm(100) cond <- rbinom(100, 1, .5) var2 <- var1 * cond + rnorm(100) df3 <- data.frame(var1,

Missing labels in Matplotlib correlation heatmap

懵懂的女人 提交于 2020-04-07 03:28:49
问题 I'm playing around with the abalone dataset from UCI's machine learning repository. I want to display a correlation heatmap using matplotlib and imshow. The first time I tried it, it worked fine. All the numeric variables plotted and labeled, seen here: fig = plt.figure(figsize=(15,8)) ax1 = fig.add_subplot(111) plt.imshow(df.corr(), cmap='hot', interpolation='nearest') plt.colorbar() labels = df.columns.tolist() ax1.set_xticklabels(labels,rotation=90, fontsize=10) ax1.set_yticklabels(labels

Extract values from a correlation matrix according to their p-value in a second matrix

淺唱寂寞╮ 提交于 2020-03-03 10:14:44
问题 I have created a correlation matrix with an external program (SparCC). I have calculated p-values from the same data in SparCC as well and I end up with two objects which I imported into R, let's call them corr and pval and > ncol(corr)==nrow(corr) [1] TRUE > ncol(pval)==nrow(pval) [1] TRUE and > colnames(corr)==rownames(pval) [1] TRUE ... and the same the other way around. Since the matrices (or should I be using data.frame ?) are fairly large (about 1000 items), I would like to extract the

Extract values from a correlation matrix according to their p-value in a second matrix

牧云@^-^@ 提交于 2020-03-03 10:14:09
问题 I have created a correlation matrix with an external program (SparCC). I have calculated p-values from the same data in SparCC as well and I end up with two objects which I imported into R, let's call them corr and pval and > ncol(corr)==nrow(corr) [1] TRUE > ncol(pval)==nrow(pval) [1] TRUE and > colnames(corr)==rownames(pval) [1] TRUE ... and the same the other way around. Since the matrices (or should I be using data.frame ?) are fairly large (about 1000 items), I would like to extract the

How do find correlation between time events and time series data in python?

ぃ、小莉子 提交于 2020-02-16 05:28:30
问题 I have two different excel files. One of them is including time series data (268943 accident time rows) as below Datetime 0 2010-01-01 14:00:00 1 2010-01-01 13:00:00 2 2010-01-01 21:00:00 3 2010-01-01 13:00:00 4 2010-01-01 21:00:00 ... ... 268938 2018-08-06 11:25:00 268939 2018-08-06 10:30:00 268940 2018-08-06 10:00:00 268941 2018-08-06 11:37:00 268942 2018-08-06 09:00:00 [268943 rows x 1 columns] dtype = datetime64[ns] The other file is blood sugar level of 14 workers measured daily from 8

Pandas simple correlation of two grouped DataFrame columns

孤街浪徒 提交于 2020-02-05 06:34:30
问题 Is there a good way to get the simple correlation of two grouped DataFrame columns? It seems like no matter what the pandas .corr() functions want to return a correlation matrix. E.g., i = pd.MultiIndex.from_product([['A','B','C'], np.arange(1, 11, 1)], names=['Name','Num']) test = pd.DataFrame(np.random.randn(30, 2), i, columns=['X', 'Y']) test.groupby(['Name'])['X','Y'].corr() returns X Y Name A X 1.000000 0.152663 Y 0.152663 1.000000 B X 1.000000 -0.155113 Y -0.155113 1.000000 C X 1.000000

Numpy Correlate is not providing an offset

时光总嘲笑我的痴心妄想 提交于 2020-02-04 02:54:10
问题 I am trying to look at astronomical spectra using Python, and I'm using numpy.correlate to try and find a radial velocity shift. I'm comparing each spectrum I have to one template spectrum. The problem that I'm encountering is that, no matter which spectra I use, numpy.correlate states that the maximal value of the correlation function occurs with a shift of zero pixels, i.e. the spectra already line up, which is very clearly not true. Here is some of the relevant code: corr = np.correlate

Holoviews tap stream of correlation heatmap and regression plot

蹲街弑〆低调 提交于 2020-01-30 08:52:04
问题 I want to make a correlation heatmap for a DataFrame and a regression plot for each pair of the variables. I have tried to read all the docs and am still having a very hard time to connect two plots so that when I tap the heatmap, the corresponding regression plot can show up. Here's some example code: import holoviews as hv from holoviews import opts import seaborn as sns import numpy as np import pandas as pd hv.extension('bokeh') df = sns.load_dataset('tips') df = df[['total_bill', 'tip',