In pandas data frame, I am using the following code to plot histogram of a column:
my_df.hist(column = \'field_1\')
Is there something that
The histogram method for RDDs returns the bin ranges and the bin counts. Here's a function that takes this histogram data and plots it as a histogram.
import numpy as np
import matplotlib.pyplot as mplt
import matplotlib.ticker as mtick
def plotHistogramData(data):
binSides, binCounts = data
N = len(binCounts)
ind = np.arange(N)
width = 1
fig, ax = mplt.subplots()
rects1 = ax.bar(ind+0.5, binCounts, width, color='b')
ax.set_ylabel('Frequencies')
ax.set_title('Histogram')
ax.set_xticks(np.arange(N+1))
ax.set_xticklabels(binSides)
ax.xaxis.set_major_formatter(mtick.FormatStrFormatter('%.2e'))
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2e'))
mplt.show()
(This code assumes that bins have equal length.)