I am having a PySpark DataFrame. How can I chop off/remove last 5 characters from the column name
below -
from pyspark.sql.functions import subs
You can use split
function. this code does what you want:
import pyspark.sql.functions as f
newDF = df.withColumn("year", f.split(df['name'], '\_')[1]).\
withColumn("flower", f.split(df['name'], '\_')[0])
newDF.show()
+--------------+----+---------+
| name|year| flower|
+--------------+----+---------+
| rose_2012|2012| rose|
| jasmine_2013|2013| jasmine|
| lily_2014|2014| lily|
| daffodil_2017|2017| daffodil|
|sunflower_2016|2016|sunflower|
+--------------+----+---------+