Pyspark replace strings in Spark dataframe column

后端 未结 2 1558
轮回少年
轮回少年 2020-12-02 20:23

I\'d like to perform some basic stemming on a Spark Dataframe column by replacing substrings. What\'s the quickest way to do this?

In my current use case, I have a

2条回答
  •  半阙折子戏
    2020-12-02 20:49

    For Spark 1.5 or later, you can use the functions package:

    from pyspark.sql.functions import *
    newDf = df.withColumn('address', regexp_replace('address', 'lane', 'ln'))
    

    Quick explanation:

    • The function withColumn is called to add (or replace, if the name exists) a column to the data frame.
    • The function regexp_replace will generate a new column by replacing all substrings that match the pattern.

提交回复
热议问题