Spark Dataframe column with last character of other column

I'm looking for a way to get the last character from a string in a dataframe column and place it into another column.

I have a Spark dataframe that looks like this:

    animal
    ======
    cat
    mouse
    snake

I want something like this:

    lastchar
    ========
    t
    e
    e

Right now I can do this with a UDF that looks like:

    def get_last_letter(animal):
        return animal[-1]

    get_last_letter_udf = udf(get_last_letter, StringType())

    df.select(get_last_letter_udf("animal").alias("lastchar")).show()

I'm mainly curious if there's a better way to do this without a UDF. Thanks!

Just use the substring function

from pyspark.sql.functions import substring
df.withColumn("b", substring(col("columnName"), -1, 1))

One way is by using Column substr() function:

df = df.withColumn("lastchar", df.animal.substr(-1,1))

See documentation: https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.Column.substr

来源：https://stackoverflow.com/questions/45512884/spark-dataframe-column-with-last-character-of-other-column

标签

apache-spark

pyspark

apache-spark-sql

pyspark-sql

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!