Plotting profile hitstograms in python

主宰稳场 提交于 2020-02-19 22:43:31

问题


I am trying to make a profile plot for two columns of a pandas.DataFrame. I would not expect this to be in pandas directly but it seems there is nothing in matplotlib either. I have searched around and cannot find it in any package other than rootpy. Before I take the time to write this myself I thought I would ask if there was a small package that contained profile histograms, perhaps where they are known by a different name.

If you don't know what I mean by "profile histogram" have a look at the ROOT implementation. http://root.cern.ch/root/html/TProfile.html


回答1:


Use seaborn. Data as from @MaxNoe

import numpy as np
import seaborn as sns

# just some random numbers to get startet
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)

sns.regplot(x=x, y=y, x_bins=10, fit_reg=None)

You can do much more (error bands are from bootstrap, you can change the estimator on the y-axis, add regression, ...)




回答2:


You can easily do it using scipy.stats.binned_statistic.

import scipy.stats
import numpy
import matplotlib.pyplot as plt

x = numpy.random.rand(10000)
y = x + scipy.stats.norm(0, 0.2).rvs(10000)

means_result = scipy.stats.binned_statistic(x, [y, y**2], bins=50, range=(0,1), statistic='mean')
means, means2 = means_result.statistic
standard_deviations = numpy.sqrt(means2 - means**2)
bin_edges = means_result.bin_edges
bin_centers = (bin_edges[:-1] + bin_edges[1:])/2.

plt.errorbar(x=bin_centers, y=means, yerr=standard_deviations, linestyle='none', marker='.')



回答3:


I made a module myself for this functionality.

import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt

def Profile(x,y,nbins,xmin,xmax,ax):
    df = DataFrame({'x' : x , 'y' : y})

    binedges = xmin + ((xmax-xmin)/nbins) * np.arange(nbins+1)
    df['bin'] = np.digitize(df['x'],binedges)

    bincenters = xmin + ((xmax-xmin)/nbins)*np.arange(nbins) + ((xmax-xmin)/(2*nbins))
    ProfileFrame = DataFrame({'bincenters' : bincenters, 'N' : df['bin'].value_counts(sort=False)},index=range(1,nbins+1))

    bins = ProfileFrame.index.values
    for bin in bins:
        ProfileFrame.ix[bin,'ymean'] = df.ix[df['bin']==bin,'y'].mean()
        ProfileFrame.ix[bin,'yStandDev'] = df.ix[df['bin']==bin,'y'].std()
        ProfileFrame.ix[bin,'yMeanError'] = ProfileFrame.ix[bin,'yStandDev'] / np.sqrt(ProfileFrame.ix[bin,'N'])

    ax.errorbar(ProfileFrame['bincenters'], ProfileFrame['ymean'], yerr=ProfileFrame['yMeanError'], xerr=(xmax-xmin)/(2*nbins), fmt=None) 
    return ax


def Profile_Matrix(frame):
  #Much of this is stolen from https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py


    import pandas.core.common as com
    import pandas.tools.plotting as plots
    from pandas.compat import lrange
    from matplotlib.artist import setp

    range_padding=0.05

    df = frame._get_numeric_data()
    n = df.columns.size

    fig, axes = plots._subplots(nrows=n, ncols=n, squeeze=False)

    # no gaps between subplots
    fig.subplots_adjust(wspace=0, hspace=0)

    mask = com.notnull(df)

    boundaries_list = []
    for a in df.columns:
        values = df[a].values[mask[a].values]
        rmin_, rmax_ = np.min(values), np.max(values)
        rdelta_ext = (rmax_ - rmin_) * range_padding / 2.
        boundaries_list.append((rmin_ - rdelta_ext, rmax_+ rdelta_ext))

    for i, a in zip(lrange(n), df.columns):
        for j, b in zip(lrange(n), df.columns):

            common = (mask[a] & mask[b]).values
            nbins = 100
            (xmin,xmax) = boundaries_list[i]

            ax = axes[i, j]
            Profile(df[a][common],df[b][common],nbins,xmin,xmax,ax)

            ax.set_xlabel('')
            ax.set_ylabel('')

            plots._label_axis(ax, kind='x', label=b, position='bottom', rotate=True)
            plots._label_axis(ax, kind='y', label=a, position='left')

            if j!= 0:
                ax.yaxis.set_visible(False)
            if i != n-1:
                ax.xaxis.set_visible(False)

    for ax in axes.flat:
        setp(ax.get_xticklabels(), fontsize=8)
        setp(ax.get_yticklabels(), fontsize=8)

    return axes



回答4:


While @Keith's answer seems to fit what you mean, it is quite a lot of code. I think this can be done much simpler, so one gets the key concepts and can adjust and build on top of it.

Let me stress one thing: what ROOT is calling a ProfileHistogram is not a special kind of plot. It is an errorbar plot. Which can simply be done in matplotlib.

It is a special kind of computation and that's not the task of a plotting library. This lies in the pandas realm, and pandas is great at stuff like this. It's symptomatical for ROOT as the giant monolithic pile it is to have an extra class for this.

So what you want to do is: discretize in some variable x and for each bin, calculate something in another variable y.

This can easily done using np.digitize together with the pandas groupy and aggregate methods.

Putting it all together:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# just some random numbers to get startet
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)
df = pd.DataFrame({'x': x, 'y': y})


# calculate in which bin row belongs base on `x`
# bins needs the bin edges, so this will give as 100 equally sized bins
bins = np.linspace(-2, 2, 101)
df['bin'] = np.digitize(x, bins=bins)
bin_centers = 0.5 * (bins[:-1] + bins[1:])
bin_width = bins[1] - bins[0]

# grouby bin, so we can calculate stuff
binned = df.groupby('bin')
# calculate mean and standard error of the mean for y in each bin
result = binned['y'].agg(['mean', 'sem'])
result['x'] = bin_centers
result['xerr'] = bin_width / 2

# plot it

result.plot(
    x='x',
    y='mean',
    xerr='xerr',
    yerr='sem',
    linestyle='none',
    capsize=0,
    color='black',
)
plt.savefig('result.png', dpi=300)

Just like ROOT ;)




回答5:


To my knowledge matplotlib doesn't still allow to directly produce profile histograms. You can instead give a look at Hippodraw, a package developed at SLAC, that can be used as a Python extension module. Here there is a Profile histogram example:

http://www.slac.stanford.edu/grp/ek/hippodraw/datareps_root.html#datareps_profilehist



来源:https://stackoverflow.com/questions/23709403/plotting-profile-hitstograms-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!