How do I replace a string value with a NULL in PySpark?

前端 未结 4 871
难免孤独
难免孤独 2020-12-09 04:49

I want to do something like this:

df.replace(\'empty-value\', None, \'NAME\')

Basically, I want to replace some value with NULL. but it doe

4条回答
  •  再見小時候
    2020-12-09 05:17

    The best alternative is the use of a when combined with a NULL. Example:

    from pyspark.sql.functions import when, lit, col
    
    df= df.withColumn('foo', when(col('foo') != 'empty-value',col('foo)))
    

    If you want to replace several values to null you can either use | inside the when condition or the powerfull create_map function.

    Important to note is that the worst way to solve it with the use of a UDF. This is so because udfs provide great versatility to your code but come with a huge penalty on performance.

提交回复
热议问题