Python spark extract characters from dataframe

问题

I have a dataframe in spark, something like this:

ID     | Column
------ | ----
1      | STRINGOFLETTERS
2      | SOMEOTHERCHARACTERS
3      | ANOTHERSTRING
4      | EXAMPLEEXAMPLE

What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:

ID     | New Column
------ | ------
1      | STRIN_F
2      | SOMEO_E
3      | ANOTH_S
4      | EXAMP_E

I can't use the following codem, because the values in the columns differ, and I don't want to split on a specific character, but on the 6th character:

import pyspark
split_col = pyspark.sql.functions.split(DF['column'], ' ')
newDF = DF.withColumn('new_column', split_col.getItem(0))

Thanks all!

回答1:

Use something like this:

df.withColumn('new_column', concat(df.Column.substr(1, 5),
                                   lit('_'),
                                   df.Column.substr(8, 1)))

This use the function substr and concat

Those functions will solve your problem.

来源：https://stackoverflow.com/questions/40916482/python-spark-extract-characters-from-dataframe

标签

python-2.7

apache-spark

pyspark

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!