Circular Histogram for Python

I have periodic data and the distribution for it is best visualised around a circle. Now the question is how can I do this visualisation using matplotlib? If not, can it be done easily in Python?

My code here will demonstrate a crude approximation of distribution around a circle:

from matplotlib import pyplot as plt
import numpy as np

#generatin random data
a=np.random.uniform(low=0,high=2*np.pi,size=50)

#real circle
b=np.linspace(0,2*np.pi,1000)
a=sorted(a)
plt.plot(np.sin(a)*0.5,np.cos(a)*0.5)
plt.plot(np.sin(b),np.cos(b))
plt.show()

There are a few examples in a question on SX for Mathematica:

Building off of this example from the gallery, you can do

import numpy as np
import matplotlib.pyplot as plt

N = 80
bottom = 8
max_height = 4

theta = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = max_height*np.random.rand(N)
width = (2*np.pi) / N

ax = plt.subplot(111, polar=True)
bars = ax.bar(theta, radii, width=width, bottom=bottom)

# Use custom colors and opacity
for r, bar in zip(radii, bars):
    bar.set_facecolor(plt.cm.jet(r / 10.))
    bar.set_alpha(0.8)

plt.show()

Of course, there are many variations and tweeks, but this should get you started.

In general, a browse through the matplotlib gallery is usually a good place to start.

Here, I used the bottom keyword to leave the center empty, because I think I saw an earlier question by you with a graph more like what I have, so I assume that's what you want. To get the full wedges that you show above, just use bottom=0 (or leave it out since 0 is the default).

I'm 5 years late to this question, but anyway...

I'd always recommend caution when using circular histograms as they can easily mislead readers.

In particular, I'd advise staying away from circular histograms where frequency and radius are plotted proportionally. I recommend this because the mind is greatly affected by the area of the bins, not just by their radial extent. This is similar to how we're used to interpreting pie charts: by area.

So, instead of using the radial extent of a bin to visualise the number of data points it contains, I'd recommend visualising the number of points by area.

The problem

Consider the consequences of doubling the number of data points in a given histogram bin. In a circular histogram where frequency and radius are proportional, the radius of this bin will increase by a factor of 2 (as the number of points has doubled). However, the area of this bin will have been increased by a factor of 4! This is because the area of the bin is proportional to the radius squared.

If this doesn't sound like too much of a problem yet, let's see it graphically:

Both of the above plots visualise the same data points.

In the lefthand plot it's easy to see that there are twice as many datapoints in the (0, pi/4) bin than there are in the (-pi/4, 0) bin.

However, take a look at the right hand plot (frequency proportional to radius). At first glance your mind is greatly affected by the area of the bins. You'd be forgiven for thinking there are more than twice as many points in the (0, pi/4) bin than in the (-pi/4, 0) bin. However, you'd have been misled. It is only on closer inspection of the graphic (and of the radial axis) that you realise there are exactly twice as many datapoints in the (0, pi/4) bin than in the (-pi/4, 0) bin. Not more than twice as many, as the graph may have originally suggested.

The above graphics can be recreated with the following code:

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn')

# Generate data with twice as many points in (0, np.pi/4) than (-np.pi/4, 0)
angles = np.hstack([np.random.uniform(0, np.pi/4, size=100),
                    np.random.uniform(-np.pi/4, 0, size=50)])

bins = 2

fig = plt.figure()
ax = fig.add_subplot(1, 2, 1)
polar_ax = fig.add_subplot(1, 2, 2, projection="polar")

# Plot "standard" histogram
ax.hist(angles, bins=bins)
# Fiddle with labels and limits
ax.set_xlim([-np.pi/4, np.pi/4])
ax.set_xticks([-np.pi/4, 0, np.pi/4])
ax.set_xticklabels([r'$-\pi/4$', r'$0$', r'$\pi/4$'])

# bin data for our polar histogram
count, bin = np.histogram(angles, bins=bins)
# Plot polar histogram
polar_ax.bar(bin[:-1], count, align='edge', color='C0')

# Fiddle with labels and limits
polar_ax.set_xticks([0, np.pi/4, 2*np.pi - np.pi/4])
polar_ax.set_xticklabels([r'$0$', r'$\pi/4$', r'$-\pi/4$'])
polar_ax.set_rlabel_position(90)

fig.tight_layout()

A solution

Since we are so greatly affected by the area of the bins in circular histograms, I find it more effective to ensure that the area of each bin is proportional to the number of observations in it, instead of the radius. This is similar to how we are used to interpreting pie charts, where area is the quantity of interest.

Let's use the dataset we used in the previous example to reproduce the graphics based on area, instead of radius:

I hypothesise that readers have less chance of being misled at first glance of this graphic.

However, when plotting a circular histogram with area proportional to radius we have the disadvantage that you'd never have known that there are exactly twice as many points in the (0, pi/4) bin than in the (-pi/4, 0) bin just by eyeballing the areas. Although, you could counter this by annotating each bin with its corresponding density. I think this disadvantage is preferable to misleading a reader.

Of course I'd ensure that an informative caption was placed alongside this figure to explain that here we visualise frequency with area, not radius.

The above plots were created as:

fig = plt.figure()
ax = fig.add_subplot(1, 2, 1)
polar_ax = fig.add_subplot(1, 2, 2, projection="polar")

# Plot "standard" histogram
ax.hist(angles, bins=bins, density=True)
# Fiddle with labels and limits
ax.set_xlim([-np.pi/4, np.pi/4])
ax.set_xticks([-np.pi/4, 0, np.pi/4])
ax.set_xticklabels([r'$-\pi/4$', r'$0$', r'$\pi/4$'])

# bin data for our polar histogram
counts, bin = np.histogram(angles, bins=bins)
# Normalise counts to compute areas
area = counts / angles.size
# Compute corresponding radii from areas
radius = (area / np.pi)**.5

polar_ax.bar(bin[:-1], radius, align='edge', color='C0')

# Label angles according to convention
polar_ax.set_xticks([0, np.pi/4, 2*np.pi - np.pi/4])
polar_ax.set_xticklabels([r'$0$', r'$\pi/4$', r'$-\pi/4$'])

fig.tight_layout()

Putting it all together

If you create lots of circular histograms, you'd do best to create some plotting function which you can reuse easily. Below I include a function I wrote and use in my work.

By default the function visualises by area, as I've recommended. However, if you'd still rather visualise bins with radius proportional to frequency you can do so by passing density=False. Additionally, you can use the argument offset to set the direction of the zero angle and lab_unit to set whether the labels should be in degrees or radians.

def rose_plot(ax, angles, bins=16, density=None, offset=0, lab_unit="degrees",
              start_zero=False, **param_dict):
    """
    Plot polar histogram of angles on ax. ax must have been created using
    subplot_kw=dict(projection='polar'). Angles are expected in radians.
    """
    # Wrap angles to [-pi, pi)
    angles = (angles + np.pi) % (2*np.pi) - np.pi

    # Set bins symetrically around zero
    if start_zero:
        # To have a bin edge at zero use an even number of bins
        if bins % 2:
            bins += 1
        bins = np.linspace(-np.pi, np.pi, num=bins+1)

    # Bin data and record counts
    count, bin = np.histogram(angles, bins=bins)

    # Compute width of each bin
    widths = np.diff(bin)

    # By default plot density (frequency potentially misleading)
    if density is None or density is True:
        # Area to assign each bin
        area = count / angles.size
        # Calculate corresponding bin radius
        radius = (area / np.pi)**.5
    else:
        radius = count

    # Plot data on ax
    ax.bar(bin[:-1], radius, zorder=1, align='edge', width=widths,
           edgecolor='C0', fill=False, linewidth=1)

    # Set the direction of the zero angle
    ax.set_theta_offset(offset)

    # Remove ylabels, they are mostly obstructive and not informative
    ax.set_yticks([])

    if lab_unit == "radians":
        label = ['$0$', r'$\pi/4$', r'$\pi/2$', r'$3\pi/4$',
                  r'$\pi$', r'$5\pi/4$', r'$3\pi/2$', r'$7\pi/4$']
        ax.set_xticklabels(label)

It's super easy to use this function. Here I demonstrate it's use for some randomly generated directions:

angles0 = np.random.normal(loc=0, scale=1, size=10000)
angles1 = np.random.uniform(0, 2*np.pi, size=1000)

# Visualise with polar histogram
fig, ax = plt.subplots(1, 2, subplot_kw=dict(projection='polar'))
rose_plot(ax[0], angles0)
rose_plot(ax[1], angles1, lab_unit="radians")
fig.tight_layout()

来源：https://stackoverflow.com/questions/22562364/circular-histogram-for-python

标签

python

matplotlib

plot

visualization

histogram