Pyspark n00b... How do I replace a column with a substring of itself? I\'m trying to remove a select number of characters from the start and end of string.
f
pyspark.sql.functions.substring(str, pos, len)
Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type
In your code,
df.withColumn('COLUMN_NAME_fix', substring('COLUMN_NAME', 1, -1))
1 is pos and -1 becomes len, length can't be -1 and so it returns null
Try this, (with fixed syntax)
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
udf1 = udf(lambda x:x[1:-1],StringType())
df.withColumn('COLUMN_NAME_fix',udf1('COLUMN_NAME')).show()