I am new to spark/zeppelin and I wanted to complete a simple exercise, where I will transform a csv file from pandas to Spark data frame and then register the table to query
You didn't say which interpreter group you were using. If it's livy
then you can't access tables registered in %livy.pyspark
from %livy.sql
. I got this from here:
for now %livy.sql can only access tables registered %livy.spark, but not %livy.pyspark and %livy.sparkr.
If you switch to the standard spark
interpreter group it should work. I can confirm this for me using Spark 1.6.3 and Zeppelin 0.7.0. Hopefully the people working on the livy interpreter will fix this restriction...
also related to the different contexts created by spark check the following setting in the spark interpreter
zeppelin.spark.useHiveContext = false
set the setting to 'false'
Correct syntax would be:
sqlContext.registerDataFrameAsTable(spark_clean_df, 'table1')
sqlContext.sql(select * from table1 where ...)
Zeppelin can create different contexts for different interpreters it is possible that if you executed some code with %spark and some code with %pyspark interpreters your Zeppelin can have two contexts. And when you use %sql it is looking in another context not in %pyspark. Try restart Zeppelin and execute %pyspark code as first statement and than %sql as second.
If you go to 'Interpreters' tab you can add zeppelin.spark.sql.stacktrace there. And after restart Zeppelin you will see full stack trace in a place where you have 'Table not found' now.
Actually this is probably answer to your question When registering a table using the %pyspark interpreter in Zeppelin, I can't access the table in %sql
Try to do
%pyspark
sqlContext = sqlc
as first two lines