scipy.stats

How to extract the distance and transport matrices from Scipy's wasserstein_distance?

戏子无情 提交于 2021-02-11 14:28:16
问题 The scipy.stats.wasserstein_distance function only returns the minimum distance (the solution) between two input distributions, p and q . But that distance is the result of the product of a distance matrix and an optimal transport matrix that must have been computed inside the same function. How can I extract the distance matrix and optimal transport matrix that correspond to the solution as 2nd and 3rd output arguments? 回答1: It does not seem that you can get the calculated transport matrix

scipy.stats attribute `entropy` for continuous distributions doesn't work manually

≡放荡痞女 提交于 2021-01-28 12:15:25
问题 Each continuous distribution in scipy.stats comes with an attribute that calculates its differential entropy: .entropy . Unlike the normal distribution ( norm ) and others that have a closed-form solution for entropy, other distributions have to rely on numerical integration. Trying to find out which function the .entropy attribute is calling in those cases, I found a function called _entropy in scipy.stats._distn_infrastructure.py that does so with integrate.quad(pdf) (numerical integration)

Why doesn't Johnson-SU distribution give positive skewness in scipy.stats?

北战南征 提交于 2021-01-28 08:10:15
问题 The code below maps the statistical moments (mean, variance, skewness, excess kurtosis) generated by corresponding parameters ( a , b , loc , scale ) of the Johnson-SU distribution ( johnsonsu ). For the range of loop values specified in my code below, no parameter configuration results in positive skewness, only negative skewness, even though it should be possible to parameterize the Johnson-SU distribution to be positively-skewed. import numpy as np import pandas as pd from scipy.stats

Transport matrix is missing in the code behind scipy.stats.wasserstein_distance

断了今生、忘了曾经 提交于 2020-12-12 05:43:42
问题 Looking at the comments for the code behind scipy.stats.wasserstein_distance which invokes a function called _cdf_distance(p, u_values, v_values, u_weights=None, v_weights=None) , it says this function implements the following formula: l_p(u, v) = \left( \int_{-\infty}^{+\infty} |U-V|^p \right)^{1/p} However, this is not the Wasserstein distance as I know it since, although I see the distance matrix |U-V| in the above formula comment, the transport matrix is noticeably absent. The transport

Transport matrix is missing in the code behind scipy.stats.wasserstein_distance

落爺英雄遲暮 提交于 2020-12-12 05:40:58
问题 Looking at the comments for the code behind scipy.stats.wasserstein_distance which invokes a function called _cdf_distance(p, u_values, v_values, u_weights=None, v_weights=None) , it says this function implements the following formula: l_p(u, v) = \left( \int_{-\infty}^{+\infty} |U-V|^p \right)^{1/p} However, this is not the Wasserstein distance as I know it since, although I see the distance matrix |U-V| in the above formula comment, the transport matrix is noticeably absent. The transport

Evaluate the goodness of a distributional fits

自闭症网瘾萝莉.ら 提交于 2020-12-06 07:34:45
问题 I have fitted some distributions for sample data with the following code: import numpy as np import pylab import matplotlib.pyplot as plt from scipy.stats import norm samp = norm.rvs(loc=0,scale=1,size=150) # (example) sample values. figprops = dict(figsize=(8., 7. / 1.618), dpi=128) adjustprops = dict(left=0.1, bottom=0.1, right=0.97, top=0.93, wspace=0.2, hspace=0.2) import pylab fig = pylab.figure(**figprops) fig.subplots_adjust(**adjustprops) ax = fig.add_subplot(1, 1, 1) ax.hist(samp

How can I get the statistics of all columns including those with a nested structure of numerical values in a dataframe, list or array?

走远了吗. 提交于 2020-07-10 10:23:26
问题 What is the best method to get the simple descriptive statistics of any column in a dataframe (or list or array), be it nested or not , a sort of advanced df.describe() that also includes nested structures with numerical values. In my case, I have a dataframe with many columns. Some columns have a numerical list in each row (in my case a time series), which is nested structure. It is not important that it is a dataframe, other structures are also included in the question, as changing between