How to test a Spark SQL Query without Scala

萝らか妹 提交于 2019-12-06 10:23:36

You could use bin/spark-sql to avoid construct Scala program and just write SQL.

In order to use bin/spark-sql you may need to rebuild your spark with -Phive and -Phive-thriftserver.

More informations on Building Spark. Note: do not build against Scala2.11, thrift server dependencies seem not ready for the moment.

You can write SQL in a file, read it in a variable in your testing script and set ssc.sql(file.read()) [Python way]

But it seems you are looking for something else. A test approach may be?

Here is one example:

[donghua@vmxdb01 ~]$ $SPARK_HOME/bin/spark-sql --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.11 --conf spark.cassandra.connection.host=127.0.0.1

spark-sql> select * from kv where value > 2;

Error in query: Table or view not found: kv; line 1 pos 14

spark-sql> create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");

16/10/12 08:28:09 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE kv USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead Time taken: 4.008 seconds

spark-sql> select * from kv; key1 1
key4 4 key3 3 key2 2 Time taken: 2.253 seconds, Fetched 4 row(s)

spark-sql> select substring(key,1,3) from kv; key
key key key Time taken: 1.328 seconds, Fetched 4 row(s)

spark-sql> select substring(key,1,3),count(*) from kv group by substring(key,1,3); key 4
Time taken: 3.518 seconds, Fetched 1 row(s) spark-sql>

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!