I want to do something like this:
df.replace(\'empty-value\', None, \'NAME\')
Basically, I want to replace some value with NULL. but it doe
The best alternative is the use of a when combined with a NULL. Example:
from pyspark.sql.functions import when, lit, col
df= df.withColumn('foo', when(col('foo') != 'empty-value',col('foo)))
If you want to replace several values to null you can either use | inside the when condition or the powerfull create_map function.
Important to note is that the worst way to solve it with the use of a UDF. This is so because udfs provide great versatility to your code but come with a huge penalty on performance.