发表新帖

发表新帖

Making histogram with Spark DataFrame column

前端未结

关注

 6  1976

盖世英雄少女心 2020-12-16 03:18

I am trying to make a histogram with a column from a dataframe which looks like

DataFrame[C0: int, C1: int, ...]

If I were to make a histog

6条回答

悲哀的现实 (楼主)

2020-12-16 03:49
The pyspark_dist_explore package that @Chris van den Berg mentioned is quite nice. If you prefer not to add an additional dependency you can use this bit of code to plot a simple histogram.
```
import matplotlib.pyplot as plt
# Show histogram of the 'C1' column
bins, counts = df.select('C1').rdd.flatMap(lambda x: x).histogram(20)

# This is a bit awkward but I believe this is the correct way to do it 
plt.hist(bins[:-1], bins=bins, weights=counts)
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题