How do I get the last item from a list using pyspark?

后端 未结 4 971
感情败类
感情败类 2021-01-12 03:04

Why does column 1st_from_end contain null:

from pyspark.sql.functions import split
df = sqlContext.createDataFrame([(\'a b c d\',)], [\'s\',])
d         


        
4条回答
  •  萌比男神i
    2021-01-12 03:55

    For Spark 2.4+, use pyspark.sql.functions.element_at, see below from the documentation:

    element_at(array, index) - Returns element of array at given (1-based) index. If index < 0, accesses elements from the last to the first. Returns NULL if the index exceeds the length of the array.

    from pyspark.sql.functions import element_at, split, col
    
    df = spark.createDataFrame([('a b c d',)], ['s',])
    
    df.withColumn('arr', split(df.s, ' ')) \
      .select( col('arr')[0].alias('0th')
             , col('arr')[3].alias('3rd')
             , element_at(col('arr'), -1).alias('1st_from_end')
         ).show()
    
    +---+---+------------+
    |0th|3rd|1st_from_end|
    +---+---+------------+
    |  a|  d|           d|
    +---+---+------------+
    

提交回复
热议问题