Pyspark alter column with substring

前端 未结 4 1916
耶瑟儿~
耶瑟儿~ 2021-01-04 08:20

Pyspark n00b... How do I replace a column with a substring of itself? I\'m trying to remove a select number of characters from the start and end of string.

f         


        
4条回答
  •  庸人自扰
    2021-01-04 09:07

    pyspark.sql.functions.substring(str, pos, len)

    Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type

    In your code,

    df.withColumn('COLUMN_NAME_fix', substring('COLUMN_NAME', 1, -1))
    1 is pos and -1 becomes len, length can't be -1 and so it returns null
    

    Try this, (with fixed syntax)

    from pyspark.sql.types import StringType
    from pyspark.sql.functions import udf
    
    udf1 = udf(lambda x:x[1:-1],StringType())
    df.withColumn('COLUMN_NAME_fix',udf1('COLUMN_NAME')).show()
    

提交回复
热议问题