spark.sql vs SqlContext

笑着哭i 提交于 2019-12-02 02:14:08

问题


I have used SQL in Spark, in this example:

results = spark.sql("select * from ventas")

where ventas is a dataframe, previosuly cataloged like a table:

df.createOrReplaceTempView('ventas')

but I have seen other ways of working with SQL in Spark, using the class SqlContext:

df = sqlContext.sql("SELECT * FROM table")

What is the difference between both of them?

Thanks in advance


回答1:


Sparksession is the preferred way of working with Spark object now. Both Hivecontext and SQLContext are available as a part of this single object SparkSession.

You are using the latest syntax by creating a view df.createOrReplaceTempView('ventas').




回答2:


From a user's perspective (not a contributor), I can only rehash what the developer's provided in the upgrade notes:

Upgrading From Spark SQL 1.6 to 2.0

  • SparkSession is now the new entry point of Spark that replaces the old SQLContext and HiveContext. Note that the old SQLContext and HiveContext are kept for backward compatibility. A new catalog interface is accessible from SparkSession - existing API on databases and tables access such as listTables, createExternalTable, dropTempView, cacheTable are moved here.

Before 2.0, the SqlContext needed an extra call to the factory that creates it. With SparkSession, they made things a lot more convenient.

If you take a look at the source code, you'll notice that the SqlContext class is mostly marked @deprecated. Closer inspection shows that the most commonly used methods simply call sparkSession.

For more info, take a look at the developer notes, Jira issues, conference talks on spark 2.0, and Databricks blog.




回答3:


  • Next create the df1 as javaobject

    df1=sqlcontext.sql("select col1,col2,col3 from table")
    
  • Next create df2 as DATAFRAME

    df2=spark.sql("select col1,col2,col3 from table")
    

Check the difference using type(df2) and type(df1)




回答4:


Before Spark 2.x SQLContext was build with help of SparkContext but after Spark 2.x SparkSession was introduced which have the functionality of HiveContext and SQLContect both.So no need of creating SQLContext separatly.

   **before Spark2.x**
   sCont = SparkContext()
   sqlCont = SQLContext(sCont)

   **after Spark 2.x:** 
   spark = SparkSession()



来源:https://stackoverflow.com/questions/51813274/spark-sql-vs-sqlcontext

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!