If I have a Scala paragraph with a DataFrame, can I share and use that with python. (As I understand it pyspark uses py4j)
I tried this:
Scala paragraph: <
You can register DataFrame as a temporary table in Scala:
// registerTempTable in Spark 1.x
df.createTempView("df")
and read it in Python with SQLContext.table:
df = sqlContext.table("df")
If you really want to use put / get you'll have build Python DataFrame from scratch:
z.put("df", df: org.apache.spark.sql.DataFrame)
from pyspark.sql import DataFrame
df = DataFrame(z.get("df"), sqlContext)
To plot with matplotlib you'll have convert DataFrame to a local Python object with either collect or toPandas:
pdf = df.toPandas()
Please note that it will fetch data to the driver.
See also moving Spark DataFrame from Python to Scala whithn Zeppelin