remove last few characters in PySpark dataframe column

后端 未结 4 686
一向
一向 2020-12-06 12:03

I am having a PySpark DataFrame. How can I chop off/remove last 5 characters from the column name below -

from pyspark.sql.functions import subs         


        
4条回答
  •  清歌不尽
    2020-12-06 12:30

    You can use expr function

    >>> from pyspark.sql.functions import substring, length, col, expr
    >>> df = df.withColumn("flower",expr("substring(name, 1, length(name)-5)"))
    >>> df.show()
    +--------------+----+---------+
    |          name|year|   flower|
    +--------------+----+---------+
    |     rose_2012|2012|     rose|
    |  jasmine_2013|2013|  jasmine|
    |     lily_2014|2014|     lily|
    | daffodil_2017|2017| daffodil|
    |sunflower_2016|2016|sunflower|
    +--------------+----+---------+
    

提交回复
热议问题