apache-spark-2.0

Spark2 session for Cassandra , sql queries

南楼画角 提交于 2021-02-08 03:50:12
问题 In Spark-2.0 what is the best way to create a Spark session. Because in both Spark-2.0 and Cassandra- the APIs have been reworked, essentially deprecating the SqlContext (and also CassandraSqlContext). So for executing SQL- either I create a Cassandra Session (com.datastax.driver.core.Session) and use execute( " ") . Or I have to create a SparkSession (org.apache.spark.sql.SparkSession) and execute sql(String sqlText) method. I don't know the SQL limitations of either - can someone explain.

Spark2 session for Cassandra , sql queries

旧时模样 提交于 2021-02-08 03:48:00
问题 In Spark-2.0 what is the best way to create a Spark session. Because in both Spark-2.0 and Cassandra- the APIs have been reworked, essentially deprecating the SqlContext (and also CassandraSqlContext). So for executing SQL- either I create a Cassandra Session (com.datastax.driver.core.Session) and use execute( " ") . Or I have to create a SparkSession (org.apache.spark.sql.SparkSession) and execute sql(String sqlText) method. I don't know the SQL limitations of either - can someone explain.

Why does SparkSQL require two literal escape backslashes in the SQL query?

旧巷老猫 提交于 2021-02-04 07:13:39
问题 When I run the below Scala code from the Spark 2.0 REPL (spark-shell), it runs as I intended it, splitting the string with a simple regular expression. import org.apache.spark.sql.SparkSession // Create session val sparkSession = SparkSession.builder.master("local").getOrCreate() // Use SparkSQL to split a string val query = "SELECT split('What is this? A string I think', '\\\\?') AS result" println("The query is: " + query) val dataframe = sparkSession.sql(query) // Show the result dataframe

Why does SparkSQL require two literal escape backslashes in the SQL query?

╄→гoц情女王★ 提交于 2021-02-04 07:13:36
问题 When I run the below Scala code from the Spark 2.0 REPL (spark-shell), it runs as I intended it, splitting the string with a simple regular expression. import org.apache.spark.sql.SparkSession // Create session val sparkSession = SparkSession.builder.master("local").getOrCreate() // Use SparkSQL to split a string val query = "SELECT split('What is this? A string I think', '\\\\?') AS result" println("The query is: " + query) val dataframe = sparkSession.sql(query) // Show the result dataframe

Task is running on only one executor in spark [duplicate]

烈酒焚心 提交于 2020-12-30 03:04:46
问题 This question already has answers here : Partitioning in spark while reading from RDBMS via JDBC (1 answer) What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters? (4 answers) Spark 2.1 Hangs while reading a huge datasets (1 answer) Closed 2 years ago . I am running below code in spark using Java. Code Test.java package com.sample; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.sql.Dataset; import org.apache

Task is running on only one executor in spark [duplicate]

老子叫甜甜 提交于 2020-12-30 02:59:44
问题 This question already has answers here : Partitioning in spark while reading from RDBMS via JDBC (1 answer) What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters? (4 answers) Spark 2.1 Hangs while reading a huge datasets (1 answer) Closed 2 years ago . I am running below code in spark using Java. Code Test.java package com.sample; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.sql.Dataset; import org.apache

Getting null pointer exception when running saveAsNewAPIHadoopDataset in scala spark2 to hbase

两盒软妹~` 提交于 2020-07-30 08:01:26
问题 I am saving a puts RDD to Hbase using saveAsNewAPIHadoopDataset. Below is my job creation and submition. val outputTableName = "test3" val conf2 = HBaseConfiguration.create() conf2.set("hbase.zookeeper.quorum", "xx.xx.xx.xx") conf2.set("hbase.mapred.outputtable", outputTableName) conf2.set("mapreduce.outputformat.class", "org.apache.hadoop.hbase.mapreduce.TableOutputFormat") val job = createJob(outputTableName, conf2) val outputTable = sc.broadcast(outputTableName) val hbasePuts = simpleRdd

Getting null pointer exception when running saveAsNewAPIHadoopDataset in scala spark2 to hbase

跟風遠走 提交于 2020-07-30 08:00:23
问题 I am saving a puts RDD to Hbase using saveAsNewAPIHadoopDataset. Below is my job creation and submition. val outputTableName = "test3" val conf2 = HBaseConfiguration.create() conf2.set("hbase.zookeeper.quorum", "xx.xx.xx.xx") conf2.set("hbase.mapred.outputtable", outputTableName) conf2.set("mapreduce.outputformat.class", "org.apache.hadoop.hbase.mapreduce.TableOutputFormat") val job = createJob(outputTableName, conf2) val outputTable = sc.broadcast(outputTableName) val hbasePuts = simpleRdd