I have a data frame (business_df) of schema:
|-- business_id: string (nullable = true)
|-- categories: array (nullable = true)
| |-- element
As shown by @Powers there is a very nice and easy to read function to remove white spaces provided by a package called quinn.You can find it here: https://github.com/MrPowers/quinn Here are the instructions on how to install it if working on a Data Bricks workspace: https://docs.databricks.com/libraries.html
Here again an illustration of how it works:
#import library
import quinn
#create an example dataframe
df = sc.parallelize([
(1, "foo bar"), (2, "foobar "), (3, " ")
]).toDF(["k", "v"])
#function call to remove whitespace. Note, withColumn will replace column v if it already exists
df = df.withColumn(
"v",
quinn.remove_all_whitespace(col("v"))
)
The output: