Remove blank space from data frame column values in Spark

后端 未结 5 911
既然无缘
既然无缘 2021-01-01 04:39

I have a data frame (business_df) of schema:

|-- business_id: string (nullable = true)
|-- categories: array (nullable = true)
|    |-- element         


        
5条回答
  •  青春惊慌失措
    2021-01-01 05:20

    As shown by @Powers there is a very nice and easy to read function to remove white spaces provided by a package called quinn.You can find it here: https://github.com/MrPowers/quinn Here are the instructions on how to install it if working on a Data Bricks workspace: https://docs.databricks.com/libraries.html

    Here again an illustration of how it works:

    #import library 
    import quinn
    
    #create an example dataframe
    df = sc.parallelize([
        (1, "foo bar"), (2, "foobar "), (3, "   ")
    ]).toDF(["k", "v"])
    
    #function call to remove whitespace. Note, withColumn will replace column v if it already exists
    df = df.withColumn(
        "v",
        quinn.remove_all_whitespace(col("v"))
    )
    

    The output:

提交回复
热议问题