Usage of spark DataFrame “as” method

后端 未结 1 411
暖寄归人
暖寄归人 2020-12-09 05:37

I am looking at spark.sql.DataFrame documentation.

There is

def as(alias: String): DataFrame
    Returns a new DataFrame with an alias set.
    Since         


        
相关标签:
1条回答
  • 2020-12-09 05:59

    Spark <= 1.5

    It is more or less equivalent to SQL table aliases:

    SELECT *
    FROM table AS alias;
    

    Example usage adapted from PySpark alias documentation:

    import org.apache.spark.sql.functions.col
    case class Person(name: String, age: Int)
    
    val df = sqlContext.createDataFrame(
        Person("Alice", 2) :: Person("Bob", 5) :: Nil)
    
    val df_as1 = df.as("df1")
    val df_as2 = df.as("df2")
    val joined_df = df_as1.join(
        df_as2, col("df1.name") === col("df2.name"), "inner")
    joined_df.select(
        col("df1.name"), col("df2.name"), col("df2.age")).show
    

    Output:

    +-----+-----+---+
    | name| name|age|
    +-----+-----+---+
    |Alice|Alice|  2|
    |  Bob|  Bob|  5|
    +-----+-----+---+
    

    Same thing using SQL query:

    df.registerTempTable("df")
    sqlContext.sql("""SELECT df1.name, df2.name, df2.age
                      FROM df AS df1 JOIN df AS df2
                      ON df1.name == df2.name""")
    

    What is the purpose of this method?

    Pretty much avoiding ambiguous column references.

    Spark 1.6+

    There is also a new as[U](implicit arg0: Encoder[U]): Dataset[U] which is used to convert a DataFrame to a DataSet of a given type. For example:

    df.as[Person]
    
    0 讨论(0)
提交回复
热议问题