问题
In spark scala is there a way to create local dataframe in executors like pandas in pyspark. In mappartitions method i want to convert iterator to local dataframe (like pandas dataframe in python) so that dataframe features can be used instead of hand coding them on iterators.
回答1:
That is not possible.
Dataframe is a distributed collection in Spark. And Dataframes can only be created on driver node (i.e. outside of transformations/actions).
Additionally, in Spark you cannot execute operations on RDDs/Dataframes/Datasets inside other operations: e.g. following code will produce errors.
rdd.map(v => rdd1.filter(e => e == v))
DF and DS also have RDDs underneath, so same behavior there.
来源:https://stackoverflow.com/questions/48715661/spark-how-can-i-create-local-dataframe-in-each-executor