Querying on multiple Hive stores using Apache Spark

前端 未结 2 1560
闹比i
闹比i 2020-12-24 11:46

I have a spark application which will successfully connect to hive and query on hive tables using spark engine.

To build this, I just added hive-site.xml

2条回答
  •  失恋的感觉
    2020-12-24 12:33

    I think this is possible by making use of Spark SQL capability of connecting and reading data from remote databases using JDBC.

    After an exhaustive R & D, I was successfully able to connect to two different hive environments using JDBC and load the hive tables as DataFrames into Spark for further processing.

    Environment details

    hadoop-2.6.0

    apache-hive-2.0.0-bin

    spark-1.3.1-bin-hadoop2.6

    Code Sample HiveMultiEnvironment.scala

    import org.apache.spark.SparkConf
    import org.apache.spark.sql.SQLContext
    import org.apache.spark.SparkContext
    object HiveMultiEnvironment {
      def main(args: Array[String]) {
        var conf = new SparkConf().setAppName("JDBC").setMaster("local")
        var sc = new SparkContext(conf)
        var sqlContext = new SQLContext(sc)
    
        // load hive table (or) sub-query from Environment 1
    
        val jdbcDF1 = sqlContext.load("jdbc", Map(
          "url" -> "jdbc:hive2://:10000/",
          "dbtable" -> "",
          "driver" -> "org.apache.hive.jdbc.HiveDriver",
          "user" -> "",
          "password" -> ""))
        jdbcDF1.foreach { println }
          
        // load hive table (or) sub-query from Environment 2
    
        val jdbcDF2 = sqlContext.load("jdbc", Map(
          "url" -> "jdbc:hive2://:10000/",
          "dbtable" -> " or ",
          "driver" -> "org.apache.hive.jdbc.HiveDriver",
          "user" -> "",
          "password" -> ""))
        jdbcDF2.foreach { println }
      }
      // todo: business logic
    }
    

    Other parameters can also be set during load using SqlContext such as setting partitionColumn. Details found under 'JDBC To Other Databases' section in Spark reference doc: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html

    Build path from Eclipse:

    What I Haven't Tried

    Use of HiveContext for Environment 1 and SqlContext for environment 2

    Hope this will be useful.

提交回复
热议问题