Remove blank space from data frame column values in Spark

后端 未结 5 912
既然无缘
既然无缘 2021-01-01 04:39

I have a data frame (business_df) of schema:

|-- business_id: string (nullable = true)
|-- categories: array (nullable = true)
|    |-- element         


        
5条回答
  •  醉酒成梦
    2021-01-01 05:39

    As @zero323 said, it's probably that you overlapped the replace function somewhere. I tested your code and it works perfectly.

    from pyspark import SparkContext
    from pyspark.sql import SQLContext
    from pyspark.sql import HiveContext
    from pyspark.sql.functions import udf
    from pyspark.sql.types import StringType
    
    df = sqlContext.createDataFrame([("aaa 111",), ("bbb 222",), ("ccc 333",)], ["names"])
    spaceDeleteUDF = udf(lambda s: s.replace(" ", ""), StringType())
    df.withColumn("names", spaceDeleteUDF("names")).show()
    
    #+------+
    #| names|
    #+------+
    #|aaa111|
    #|bbb222|
    #|ccc333|
    #+------+
    

提交回复
热议问题