Pyspark: show histogram of a data frame column

前端 未结 5 1979
情书的邮戳
情书的邮戳 2020-12-14 01:04

In pandas data frame, I am using the following code to plot histogram of a column:

my_df.hist(column = \'field_1\')

Is there something that

5条回答
  •  一生所求
    2020-12-14 01:37

    You can now use the pyspark_dist_explore package to leverage the matplotlib hist function for Spark DataFrames:

    from pyspark_dist_explore import hist
    import matplotlib.pyplot as plt
    
    fig, ax = plt.subplots()
    hist(ax, data_frame, bins = 20, color=['red'])
    

    This library uses the rdd histogram function to calculate bin values.

提交回复
热议问题