Making histogram with Spark DataFrame column

前端 未结 6 1976
盖世英雄少女心
盖世英雄少女心 2020-12-16 03:18

I am trying to make a histogram with a column from a dataframe which looks like

DataFrame[C0: int, C1: int, ...]

If I were to make a histog

6条回答
  •  悲哀的现实
    2020-12-16 03:49

    The pyspark_dist_explore package that @Chris van den Berg mentioned is quite nice. If you prefer not to add an additional dependency you can use this bit of code to plot a simple histogram.

    import matplotlib.pyplot as plt
    # Show histogram of the 'C1' column
    bins, counts = df.select('C1').rdd.flatMap(lambda x: x).histogram(20)
    
    # This is a bit awkward but I believe this is the correct way to do it 
    plt.hist(bins[:-1], bins=bins, weights=counts)
    

提交回复
热议问题