spark createOrReplaceTempView vs createGlobalTempView

前端 未结 3 1712
日久生厌
日久生厌 2020-12-02 19:04

Spark Dataset 2.0 provides two functions createOrReplaceTempView and createGlobalTempView. I am not able to understand the basic difference betwee

相关标签:
3条回答
  • 2020-12-02 19:38

    The Answer to your questions is basically understanding the difference of a Spark Application and a Spark Session.

    Spark application can be used:

    • for a single batch job
    • an interactive session with multiple jobs
    • a long-lived server continually satisfying requests
    • A Spark job can consist of more than just a single map and reduce.
    • A Spark Application can consist of more than one session

    A SparkSession on the other hand is associated to a Spark Application:

    • Generally, a session is an interaction between two or more entities.
    • in Spark 2.0 you can use SparkSession
    • A SparkSession can be created without creating SparkConf, SparkContext or SQLContext, (they’re encapsulated within the SparkSession)

    Global temporary views are introduced in Spark 2.1.0 release. This feature is useful when you want to share data among different sessions and keep alive until your application ends.Please see a shot sample I wrote to illustrate the use for createTempView and createGlobalTempView

    object NewSessionApp {
    
      def main(args: Array[String]): Unit = {
    
        val logFile = "data/README.md" // Should be some file on your system
        val spark = SparkSession.
          builder.
          appName("Simple Application").
          master("local").
          getOrCreate()
    
        val logData = spark.read.textFile(logFile).cache()
        logData.createGlobalTempView("logdata")
        spark.range(1).createTempView("foo")
    
        // within the same session the foo table exists 
        println("""spark.catalog.tableExists("foo") = """ + spark.catalog.tableExists("foo"))
        //spark.catalog.tableExists("foo") = true
    
        // for a new session the foo table does not exists
        val newSpark = spark.newSession
        println("""newSpark.catalog.tableExists("foo") = """ + newSpark.catalog.tableExists("foo"))
        //newSpark.catalog.tableExists("foo") = false
    
        //both session can access the logdata table
        spark.sql("SELECT * FROM global_temp.logdata").show()
        newSpark.sql("SELECT * FROM global_temp.logdata").show()
    
        spark.stop()
      }
    }
    
    0 讨论(0)
  • 2020-12-02 19:48
    df.createOrReplaceTempView("tempViewName")
    df.createGlobalTempView("tempViewName")
    

    createOrReplaceTempView() creates or replaces a local temporary view with this dataframe df. Lifetime of this view is dependent to SparkSession class, is you want to drop this view :

    spark.catalog.dropTempView("tempViewName")
    

    or stop() will shutdown the session

    self.ss = SparkSession(sc)
    ...
    self.ss.stop()
    

    createGlobalTempView() creates a global temporary view with this dataframe df. life time of this view is dependent to spark application itself. If you want to drop :

    spark.catalog.dropGlobalTempView("tempViewName")
    

    or stop() will shutdown

    ss =  SparkContext(conf=conf, ......)
    ...
    ss.stop()
    
    0 讨论(0)
  • 2020-12-02 19:49

    createOrReplaceTempView has been introduced in Spark 2.0 to replace registerTempTable. CreateTempView creates an in-memory reference to the Dataframe in use. The lifetime for this depends on the spark session in which the Dataframe was created in. createGlobalTempView, on the other hand, allows you to create the references that can be used across spark sessions. So depending upon whether you need to share data across sessions, you can use either of the methods. By default, the notebooks in the same cluster share the same spark session, but there is an option to set up clusters where each notebook has its own session. So all it boils down to is that where do you create the data frame and where do you want to access it.

    0 讨论(0)
提交回复
热议问题