Zeppelin - Cannot query with %sql a table I registered with pyspark

大憨熊 提交于 2019-12-30 08:36:07

问题


I am new to spark/zeppelin and I wanted to complete a simple exercise, where I will transform a csv file from pandas to Spark data frame and then register the table to query it with sql and visualise it using Zeppelin.

But I seem to be failing in the last step.

I am using Spark 1.6.1

Here is my code:

%pyspark
spark_clean_df.registerTempTable("table1")
print spark_clean_df.dtypes
print sqlContext.sql("select count(*) from table1").collect()

Here is the output:

[('id', 'bigint'), ('name', 'string'), ('host_id', 'bigint'), ('host_name', 'string'), ('neighbourhood', 'string'), ('latitude', 'double'), ('longitude', 'double'), ('room_type', 'string'), ('price', 'bigint'), ('minimum_nights', 'bigint'), ('number_of_reviews', 'bigint'), ('last_review', 'string'), ('reviews_per_month', 'double'), ('calculated_host_listings_count', 'bigint'), ('availability_365', 'bigint')]
[Row(_c0=4961)]

But when I try to use %sql I get this error:

%sql
select * from table1

Table not found: table1; line 1 pos 14
set zeppelin.spark.sql.stacktrace = true to see full stacktrace

Any help would be appreciated - I don't even know where to find this stacktrace and how could it help me.

Thanks :)


回答1:


Zeppelin can create different contexts for different interpreters it is possible that if you executed some code with %spark and some code with %pyspark interpreters your Zeppelin can have two contexts. And when you use %sql it is looking in another context not in %pyspark. Try restart Zeppelin and execute %pyspark code as first statement and than %sql as second.

If you go to 'Interpreters' tab you can add zeppelin.spark.sql.stacktrace there. And after restart Zeppelin you will see full stack trace in a place where you have 'Table not found' now.

Actually this is probably answer to your question When registering a table using the %pyspark interpreter in Zeppelin, I can't access the table in %sql

Try to do

    %pyspark
    sqlContext = sqlc

as first two lines




回答2:


also related to the different contexts created by spark check the following setting in the spark interpreter

zeppelin.spark.useHiveContext = false

set the setting to 'false'




回答3:


You didn't say which interpreter group you were using. If it's livy then you can't access tables registered in %livy.pyspark from %livy.sql. I got this from here:

for now %livy.sql can only access tables registered %livy.spark, but not %livy.pyspark and %livy.sparkr.

If you switch to the standard spark interpreter group it should work. I can confirm this for me using Spark 1.6.3 and Zeppelin 0.7.0. Hopefully the people working on the livy interpreter will fix this restriction...




回答4:


Correct syntax would be:

sqlContext.registerDataFrameAsTable(spark_clean_df, 'table1')
sqlContext.sql(select * from table1 where ...)


来源:https://stackoverflow.com/questions/37576042/zeppelin-cannot-query-with-sql-a-table-i-registered-with-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!