Cross Join for calculation in Spark SQL

后端 未结 3 1098
离开以前
离开以前 2021-01-27 19:37

I have a temporary view with only 1 record/value and I want to use that value to calculate the age of the customers present in another big table (with 100

相关标签:
3条回答
  • 2021-01-27 20:24

    Inside view you are using constant value, You can simply put same value in below query without cross join.

    select 
    a.custid, 
    a.birthdt, 
    cast((datediff(to_date('10-05-2020', 'dd-MM-yyyy'), a.birthdt)/365.25) as int) as age
    from cust a;
    
    scala> spark.sql("select * from cust").show(false)
    +------+----------+
    |custid|birthdt   |
    +------+----------+
    |A1234 |1980-03-20|
    |B3456 |1985-05-09|
    |C2356 |1990-12-15|
    +------+----------+
    
    scala> spark.sql("select a.custid, a.birthdt, cast((datediff(to_date('10-05-2020', 'dd-MM-yyyy'), a.birthdt)/365.25) as int) as age from cust a").show(false)
    +------+----------+---+
    |custid|birthdt   |age|
    +------+----------+---+
    |A1234 |1980-03-20|40 |
    |B3456 |1985-05-09|35 |
    |C2356 |1990-12-15|29 |
    +------+----------+---+
    
    0 讨论(0)
  • 2021-01-27 20:37

    Hard to work out exactly your point, but if you cannot use Scala or pyspark and dataframes with .cache etc. then I think that instead of of using a temporary view, just create a single row table. My impression is you are using Spark %sql in a notebook on, say, Databricks.

    This is my suspicion as it were.

    That said a broadcastjoin hint may well mean the optimizer only sends out 1 row. See https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hint-framework.html#specifying-query-hints

    0 讨论(0)
  • 2021-01-27 20:43

    Simply use withColumn!

    df.withColumn("new_col", lit("10-05-2020").cast("date"))
    
    0 讨论(0)
提交回复
热议问题