selecting a range of elements in an array spark sql

后端 未结 8 555
闹比i
闹比i 2020-12-14 23:16

I use spark-shell to do the below operations.

Recently loaded a table with an array column in spark-sql .

Here is the DDL for the same:

8条回答
  •  感情败类
    2020-12-15 00:13

    Here is a solution using a User Defined Function which has the advantage of working for any slice size you want. It simply builds a UDF function around the scala builtin slice method :

    import sqlContext.implicits._
    import org.apache.spark.sql.functions._
    
    val slice = udf((array : Seq[String], from : Int, to : Int) => array.slice(from,to))
    

    Example with a sample of your data :

    val df = sqlContext.sql("select array('Jon', 'Snow', 'Castle', 'Black', 'Ned') as emp_details")
    df.withColumn("slice", slice($"emp_details", lit(0), lit(3))).show
    

    Produces the expected output

    +--------------------+-------------------+
    |         emp_details|              slice|
    +--------------------+-------------------+
    |[Jon, Snow, Castl...|[Jon, Snow, Castle]|
    +--------------------+-------------------+
    

    You can also register the UDF in your sqlContext and use it like this

    sqlContext.udf.register("slice", (array : Seq[String], from : Int, to : Int) => array.slice(from,to))
    sqlContext.sql("select array('Jon','Snow','Castle','Black','Ned'),slice(array('Jon‌​','Snow','Castle','Black','Ned'),0,3)")
    

    You won't need lit anymore with this solution

提交回复
热议问题