correlation

how to check for correlation among continuous and categorical variables in python?

南笙酒味 提交于 2019-12-21 04:32:39
问题 I have a dataset including categorical variables(binary) and continuous variables. I'm trying to apply a linear regression model for predicting a continuous variable. Can someone please let me know how to check for correlation among the categorical variables and the continuous target variable. Current Code: import pandas as pd df_hosp = pd.read_csv('C:\Users\LAPPY-2\Desktop\LengthOfStay.csv') data = df_hosp[['lengthofstay', 'male', 'female', 'dialysisrenalendstage', 'asthma', \ 'irondef',

In Python, how can I calculate correlation and statistical significance between two arrays of data?

大兔子大兔子 提交于 2019-12-21 04:12:59
问题 I have sets of data with two equally long arrays of data, or I can make an array of two-item entries, and I would like to calculate the correlation and statistical significance represented by the data (which may be tightly correlated, or may have no statistically significant correlation). I am programming in Python and have scipy and numpy installed. I looked and found Calculating Pearson correlation and significance in Python, but that seems to want the data to be manipulated so it falls

What to use to do multiple correlation?

て烟熏妆下的殇ゞ 提交于 2019-12-20 23:57:28
问题 I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible. edit [clarification]: Considering a situation like the one described here: http:/

Correlation matrix plot with ggplot2

蓝咒 提交于 2019-12-20 21:54:09
问题 I want to create a correlation matrix plot, i.e. a plot where each variable is plotted in a scatterplot against each other variable like with pairs() or splom() . I want to do this with ggplot2. See here for examples. The link mentions some code someone wrote for doing this in ggplot2, however, it is outdated and no longer works (even after you swap out the deprecated parts). One could do this with a loop in a loop and then multiplot() , but there must be a better way. I tried melting the

Correlation coefficients for sparse matrix in python?

╄→尐↘猪︶ㄣ 提交于 2019-12-20 18:35:32
问题 Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like numpy.corrcoef that will work on a scipy sparse matrix. 回答1: You can compute the correlation coefficients fairly straightforwardly from the covariance matrix like this: import numpy as np from scipy import sparse def sparse_corrcoef(A, B=None): if B is not None: A = sparse.vstack((A, B), format='csr') A = A.astype(np.float64) n = A.shape[1] # Compute the

Python cross correlation

时光毁灭记忆、已成空白 提交于 2019-12-20 15:21:10
问题 I have a pair of 1D arrays (of different lengths) like the following: data1 = [0,0,0,1,1,1,0,1,0,0,1] data2 = [0,1,1,0,1,0,0,1] I would like to get the max cross correlation of the 2 series in python. In matlab, the xcorr() function will return it OK I have tried the following 2 methods: numpy.correlate(data1, data2) signal.fftconvolve(data2, data1[::-1], mode='full') Both methods give me the same values, but the values I get from python are different from what comes out of matlab. Python

Fast cross correlation method in Python

帅比萌擦擦* 提交于 2019-12-20 12:48:13
问题 I have been recently trying to find a fast and efficient way to perform cross correlation check between two arrays using Python language. After some reading, I found these two options: The NumPy.correlate() method, which is too slow when it comes to large arrays. The cv.MatchTemplate() method, which seems to be much faster. For obvious reasons, I chose the second option. I tried to execute the following code: import scipy import cv image = cv.fromarray(scipy.float32(scipy.asarray([1,2,2,1]))

How to get the correlation between two timeseries using Pandas

我的梦境 提交于 2019-12-20 09:38:03
问题 I have two sets of temperature date, which have readings at regular (but different) time intervals. I'm trying to get the correlation between these two sets of data. I've been playing with Pandas to try to do this. I've created two timeseries, and am using TimeSeriesA.corr(TimeSeriesB) . However, if the times in the 2 timeSeries do not match up exactly (they're generally off by seconds), I get Null as an answer. I could get a decent answer if I could: a) Interpolate/fill missing times in each

Plot a Correlation Circle in Python

百般思念 提交于 2019-12-20 01:43:33
问题 I've been doing some Geometrical Data Analysis (GDA) such as Principal Component Analysis (PCA). I'm looking to plot a Correlation Circle... these look a bit like this: Basically, it allows to measure to which extend the Eigenvalue / Eigenvector of a variable is correlated to the principal components (dimensions) of a dataset. Anyone knows if there is a python package that plots such data visualization? 回答1: I agree it's a pity not to have it in some mainstream package such as sklearn. Here

How to match MQ Server reply messages to the correct request

狂风中的少年 提交于 2019-12-20 01:08:12
问题 I'm connecting to an IBM Websphere MQ. I want to be able to match the reply message with the correct request message. I've trawled through hundreds of pages to get this and have had no luck. I have a class - MQHandler - which sends a message to one defined queue, and reads the request from another. This works fine, however, if multiple users are using the application at the same time, messages get mixed up. I can't seem to get a method on the receiver to indicate the CorrelationID to match.