I am using Spark 1.5.
I have two dataframes of the form:
scala> libriFirstTable50Plus3DF res1: org.apache.spark.sql.DataFrame = [basket_id: string
In addition to increasing spark.sql.broadcastTimeout or persist() both DataFrames,
spark.sql.broadcastTimeout
You may try:
1.disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1
spark.sql.autoBroadcastJoinThreshold
-1
2.increase the spark driver memory by setting spark.driver.memory to a higher value.
spark.driver.memory