问题
I currently load CSV files into Dataframes using the databricks library.
I'm looking for the best generic approach to cogroup my loaded dataframes using a specific key since the cogroup operation is only available for PairRDDs.
I found this post which implements a cogroup feature for Dataframes but I guess there are some different approaches :
https://gist.github.com/ahoy-jon/b65754cde98cc48b9b38
Have you please ever faced this situation ?
Thanks.
来源:https://stackoverflow.com/questions/31806473/spark-dataframe-best-way-to-cogroup-dataframes