remove last few characters in PySpark dataframe column

后端 未结 4 677
一向
一向 2020-12-06 12:03

I am having a PySpark DataFrame. How can I chop off/remove last 5 characters from the column name below -

from pyspark.sql.functions import subs         


        
4条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-06 12:49

    You can use split function. this code does what you want:

    import pyspark.sql.functions as f
    
    newDF = df.withColumn("year", f.split(df['name'], '\_')[1]).\
               withColumn("flower", f.split(df['name'], '\_')[0])
    
    newDF.show()
    
    +--------------+----+---------+
    |          name|year|   flower|
    +--------------+----+---------+
    |     rose_2012|2012|     rose|
    |  jasmine_2013|2013|  jasmine|
    |     lily_2014|2014|     lily|
    | daffodil_2017|2017| daffodil|
    |sunflower_2016|2016|sunflower|
    +--------------+----+---------+
    

提交回复
热议问题