selecting a range of elements in an array spark sql

后端 未结 8 564
闹比i
闹比i 2020-12-14 23:16

I use spark-shell to do the below operations.

Recently loaded a table with an array column in spark-sql .

Here is the DDL for the same:

8条回答
  •  情话喂你
    2020-12-15 00:11

    Use nested split:

    split(split(concat_ws(',',emp_details),concat(',',emp_details[3]))[0],',')

    scala> import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.SparkSession
    
    scala> val spark=SparkSession.builder().getOrCreate()
    spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@1d637673
    
    scala> val df = spark.read.json("file:///Users/gengmei/Desktop/test/test.json")
    18/12/11 10:09:32 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
    df: org.apache.spark.sql.DataFrame = [dept_id: bigint, dept_nm: string ... 1 more field]
    
    scala> df.createOrReplaceTempView("raw_data")
    
    scala> df.show()
    +-------+-------+--------------------+
    |dept_id|dept_nm|         emp_details|
    +-------+-------+--------------------+
    |     10|Finance|[Jon, Snow, Castl...|
    |     20|     IT| [Ned, is, no, more]|
    +-------+-------+--------------------+
    
    
    scala> val df2 = spark.sql(
         | s"""
         | |select dept_id,dept_nm,split(split(concat_ws(',',emp_details),concat(',',emp_details[3]))[0],',') as emp_details from raw_data
         | """)
    df2: org.apache.spark.sql.DataFrame = [dept_id: bigint, dept_nm: string ... 1 more field]
    
    scala> df2.show()
    +-------+-------+-------------------+
    |dept_id|dept_nm|        emp_details|
    +-------+-------+-------------------+
    |     10|Finance|[Jon, Snow, Castle]|
    |     20|     IT|      [Ned, is, no]|
    +-------+-------+-------------------+
    

提交回复
热议问题