comapring compressed distribution per cohort
问题 How can I easily compare the distributions of multiple cohorts? Usually, https://seaborn.pydata.org/generated/seaborn.distplot.html would be a great tool to visually compare distributions. However, due to the size of my dataset, I needed to compress it and only keep the counts. It was created as: SELECT age, gender, compress_distributionUDF(collect_list(struct(target_y_n, count, distribution_value))) GROUP BY age, gender where compress_distributionUDF simply takes a list of tuples and returns