发表新帖

发表新帖

Pyspark: show histogram of a data frame column

前端未结

关注

 5  1987

情书的邮戳 2020-12-14 01:04

In pandas data frame, I am using the following code to plot histogram of a column:

my_df.hist(column = \'field_1\')

Is there something that

5条回答

一生所求 (楼主)

2020-12-14 01:37
You can now use the pyspark_dist_explore package to leverage the matplotlib hist function for Spark DataFrames:
```
from pyspark_dist_explore import hist
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
hist(ax, data_frame, bins = 20, color=['red'])
```
This library uses the rdd histogram function to calculate bin values.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题