How to test a Spark SQL Query without Scala

一曲冷凌霜 提交于 2020-01-02 20:43:32

问题


I am trying to figure out how to test Spark SQL queries against a Cassandra database -- kind of like you would in SQL Server Management Studio. Currently I have to open the Spark Console and type Scala commands which is really tedious and error prone.

Something like:

scala > var query = csc.sql("select * from users");
scala > query.collect().foreach(println)

Especially with longer queries this can be a real pain.

This seems like a terribly inefficient way to test if your query is correct and what data you will get back. The other issue is when your query is wrong you get back a mile long error message and you have to scroll up the console to find it. How do I test my spark queries without using the console or writing my own application?


回答1:


You could use bin/spark-sql to avoid construct Scala program and just write SQL.

In order to use bin/spark-sql you may need to rebuild your spark with -Phive and -Phive-thriftserver.

More informations on Building Spark. Note: do not build against Scala2.11, thrift server dependencies seem not ready for the moment.




回答2:


You can write SQL in a file, read it in a variable in your testing script and set ssc.sql(file.read()) [Python way]

But it seems you are looking for something else. A test approach may be?




回答3:


Here is one example:

[donghua@vmxdb01 ~]$ $SPARK_HOME/bin/spark-sql --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.11 --conf spark.cassandra.connection.host=127.0.0.1

spark-sql> select * from kv where value > 2;

Error in query: Table or view not found: kv; line 1 pos 14

spark-sql> create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");

16/10/12 08:28:09 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE kv USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead Time taken: 4.008 seconds

spark-sql> select * from kv; key1 1
key4 4 key3 3 key2 2 Time taken: 2.253 seconds, Fetched 4 row(s)

spark-sql> select substring(key,1,3) from kv; key
key key key Time taken: 1.328 seconds, Fetched 4 row(s)

spark-sql> select substring(key,1,3),count(*) from kv group by substring(key,1,3); key 4
Time taken: 3.518 seconds, Fetched 1 row(s) spark-sql>



来源:https://stackoverflow.com/questions/30293070/how-to-test-a-spark-sql-query-without-scala

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!