Remove blank space from data frame column values in Spark

后端 未结 5 910
既然无缘
既然无缘 2021-01-01 04:39

I have a data frame (business_df) of schema:

|-- business_id: string (nullable = true)
|-- categories: array (nullable = true)
|    |-- element         


        
5条回答
  •  清酒与你
    2021-01-01 05:40

    Here's a function that removes all whitespace in a string:

    import pyspark.sql.functions as F
    
    def remove_all_whitespace(col):
        return F.regexp_replace(col, "\\s+", "")
    

    You can use the function like this:

    actual_df = source_df.withColumn(
        "words_without_whitespace",
        quinn.remove_all_whitespace(col("words"))
    )
    

    The remove_all_whitespace function is defined in the quinn library. quinn also defines single_space and anti_trim methods to manage whitespace. PySpark defines ltrim, rtrim, and trim methods to manage whitespace.

提交回复
热议问题