How to convert column of arrays of strings to strings?

后端 未结 4 2045
死守一世寂寞
死守一世寂寞 2020-12-13 16:23

I have a column, which is of type array < string > in spark tables. I am using SQL to query these spark tables. I wanted to convert the array < s

相关标签:
4条回答
  • 2020-12-13 17:06

    In Spark 2.1+ to do the concatenation of the values in a single Array column you can use the following:

    1. concat_ws standard function
    2. map operator
    3. a user-defined function (UDF)

    concat_ws Standard Function

    Use concat_ws function.

    concat_ws(sep: String, exprs: Column*): Column Concatenates multiple input string columns together into a single string column, using the given separator.

    val solution = words.withColumn("codes", concat_ws(" ", $"rate_plan_code"))
    scala> solution.show
    +--------------+-----------+
    |         words|      codes|
    +--------------+-----------+
    |[hello, world]|hello world|
    +--------------+-----------+
    

    map Operator

    Use map operator to have full control of what and how should be transformed.

    map[U](func: (T) ⇒ U): Dataset[U] Returns a new Dataset that contains the result of applying func to each element.

    scala> codes.show(false)
    +---+---------------------------+
    |id |rate_plan_code             |
    +---+---------------------------+
    |0  |[AAA, RACK, SMOBIX, SMOBPX]|
    +---+---------------------------+
    
    val codesAsSingleString = codes.as[(Long, Array[String])]
      .map { case (id, codes) => (id, codes.mkString(", ")) }
      .toDF("id", "codes")
    
    scala> codesAsSingleString.show(false)
    +---+-------------------------+
    |id |codes                    |
    +---+-------------------------+
    |0  |AAA, RACK, SMOBIX, SMOBPX|
    +---+-------------------------+
    
    scala> codesAsSingleString.printSchema
    root
     |-- id: long (nullable = false)
     |-- codes: string (nullable = true)
    
    0 讨论(0)
  • 2020-12-13 17:06

    In spark 2.1+, you can directly use concat_ws to convert(concat with seperator) string/array< String > into String .

    select concat_ws(',',rate_plan_code) as new_rate_plan  from
    customer_activity_searches group by rate_plan_code
    

    This will give you response like:

    AAA,RACK,SMOBIX,SMOBPX 
    LPCT,RACK
    LFTIN,RACK,SMOBIX,SMOBPX
    LTGD,RACK 
    RACK,LEARLI,NHDP,LADV,LADV2
    

    PS : concat_ws doesn't works with like array< Long > ..., for which UDF or map would be the only option as told by Jacek.

    0 讨论(0)
  • 2020-12-13 17:10

    You can cast array to string at create this df not at output

    newdf = df.groupBy('aaa')
      .agg(F.collect_list('bbb').("string").alias('ccc'))
    
    outputdf = newdf.select(
      F.concat_ws(', ' , newdf.aaa, F.format_string('xxxxx(%s)', newdf.ccc)))
    
    0 讨论(0)
  • 2020-12-13 17:14

    The way to do what you want in SQL is to use the inbuilt sql function string()

    select string(rate_plan_code) as new_rate_plan  from
    customer_activity_searches group by rate_plan_code
    
    0 讨论(0)
提交回复
热议问题