Why does column 1st_from_end
contain null:
from pyspark.sql.functions import split
df = sqlContext.createDataFrame([(\'a b c d\',)], [\'s\',])
d
For Spark 2.4+, use pyspark.sql.functions.element_at, see below from the documentation:
element_at(array, index) - Returns element of array at given (1-based) index. If index < 0, accesses elements from the last to the first. Returns NULL if the index exceeds the length of the array.
from pyspark.sql.functions import element_at, split, col
df = spark.createDataFrame([('a b c d',)], ['s',])
df.withColumn('arr', split(df.s, ' ')) \
.select( col('arr')[0].alias('0th')
, col('arr')[3].alias('3rd')
, element_at(col('arr'), -1).alias('1st_from_end')
).show()
+---+---+------------+
|0th|3rd|1st_from_end|
+---+---+------------+
| a| d| d|
+---+---+------------+