Pyspark: Convert column to lowercase

前端 未结 2 1621
抹茶落季
抹茶落季 2020-12-16 10:45

I want to convert the values inside a column to lowercase. Currently if I use the lower() method, it complains that column objects are not callable. Since there

2条回答
  •  猫巷女王i
    2020-12-16 11:20

    Import lower alongside col:

    from pyspark.sql.functions import lower, col
    

    Combine them together using lower(col("bla")). In a complete query:

    spark.table('bla').select(lower(col('bla')).alias('bla'))
    

    which is equivalent to the SQL query

    SELECT lower(bla) AS bla FROM bla
    

    To keep the other columns, do

    spark.table('foo').withColumn('bar', lower(col('bar')))
    

    Needless to say, this approach is better than using a UDF because UDFs have to call out to Python (which is a slow operation, and Python itself is slow), and is more elegant than writing it in SQL.

提交回复
热议问题