How does createOrReplaceTempView work in Spark?

前端 未结 3 732
轻奢々
轻奢々 2020-12-02 14:16

I am new to Spark and Spark SQL.

How does createOrReplaceTempView work in Spark?

If we register an RDD of objects as a table will

3条回答
  •  鱼传尺愫
    2020-12-02 14:50

    createOrReplaceTempView creates (or replaces if that view name already exists) a lazily evaluated "view" that you can then use like a hive table in Spark SQL. It does not persist to memory unless you cache the dataset that underpins the view.

    scala> val s = Seq(1,2,3).toDF("num")
    s: org.apache.spark.sql.DataFrame = [num: int]
    
    scala> s.createOrReplaceTempView("nums")
    
    scala> spark.table("nums")
    res22: org.apache.spark.sql.DataFrame = [num: int]
    
    scala> spark.table("nums").cache
    res23: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [num: int]
    
    scala> spark.table("nums").count
    res24: Long = 3
    

    The data is cached fully only after the .count call. Here's proof it's been cached:

    Related SO: spark createOrReplaceTempView vs createGlobalTempView

    Relevant quote (comparing to persistent table): "Unlike the createOrReplaceTempView command, saveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore." from https://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables

    Note : createOrReplaceTempView was formerly registerTempTable

提交回复
热议问题