In pandas data frame, I am using the following code to plot histogram of a column:
my_df.hist(column = \'field_1\')
Is there something that
You can now use the pyspark_dist_explore package to leverage the matplotlib hist function for Spark DataFrames:
from pyspark_dist_explore import hist
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
hist(ax, data_frame, bins = 20, color=['red'])
This library uses the rdd histogram function to calculate bin values.