I have a column, which is of type array < string >
in spark tables. I am using SQL to query these spark tables. I wanted to convert the array < s
In Spark 2.1+ to do the concatenation of the values in a single Array column you can use the following:
concat_ws
standard functionmap
operatorUse concat_ws function.
concat_ws(sep: String, exprs: Column*): Column Concatenates multiple input string columns together into a single string column, using the given separator.
val solution = words.withColumn("codes", concat_ws(" ", $"rate_plan_code"))
scala> solution.show
+--------------+-----------+
| words| codes|
+--------------+-----------+
|[hello, world]|hello world|
+--------------+-----------+
Use map operator to have full control of what and how should be transformed.
map[U](func: (T) ⇒ U): Dataset[U] Returns a new Dataset that contains the result of applying func to each element.
scala> codes.show(false)
+---+---------------------------+
|id |rate_plan_code |
+---+---------------------------+
|0 |[AAA, RACK, SMOBIX, SMOBPX]|
+---+---------------------------+
val codesAsSingleString = codes.as[(Long, Array[String])]
.map { case (id, codes) => (id, codes.mkString(", ")) }
.toDF("id", "codes")
scala> codesAsSingleString.show(false)
+---+-------------------------+
|id |codes |
+---+-------------------------+
|0 |AAA, RACK, SMOBIX, SMOBPX|
+---+-------------------------+
scala> codesAsSingleString.printSchema
root
|-- id: long (nullable = false)
|-- codes: string (nullable = true)
In spark 2.1+, you can directly use concat_ws to convert(concat with seperator) string/array< String > into String .
select concat_ws(',',rate_plan_code) as new_rate_plan from
customer_activity_searches group by rate_plan_code
This will give you response like:
AAA,RACK,SMOBIX,SMOBPX
LPCT,RACK
LFTIN,RACK,SMOBIX,SMOBPX
LTGD,RACK
RACK,LEARLI,NHDP,LADV,LADV2
PS : concat_ws doesn't works with like array< Long > ..., for which UDF or map would be the only option as told by Jacek.
You can cast array to string at create this df not at output
newdf = df.groupBy('aaa')
.agg(F.collect_list('bbb').("string").alias('ccc'))
outputdf = newdf.select(
F.concat_ws(', ' , newdf.aaa, F.format_string('xxxxx(%s)', newdf.ccc)))
The way to do what you want in SQL is to use the inbuilt sql function string()
select string(rate_plan_code) as new_rate_plan from
customer_activity_searches group by rate_plan_code