Python spark extract characters from dataframe

假如想象 提交于 2019-12-22 03:46:03

问题


I have a dataframe in spark, something like this:

ID     | Column
------ | ----
1      | STRINGOFLETTERS
2      | SOMEOTHERCHARACTERS
3      | ANOTHERSTRING
4      | EXAMPLEEXAMPLE

What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:

ID     | New Column
------ | ------
1      | STRIN_F
2      | SOMEO_E
3      | ANOTH_S
4      | EXAMP_E

I can't use the following codem, because the values in the columns differ, and I don't want to split on a specific character, but on the 6th character:

import pyspark
split_col = pyspark.sql.functions.split(DF['column'], ' ')
newDF = DF.withColumn('new_column', split_col.getItem(0))

Thanks all!


回答1:


Use something like this:

df.withColumn('new_column', concat(df.Column.substr(1, 5),
                                   lit('_'),
                                   df.Column.substr(8, 1)))

This use the function substr and concat

Those functions will solve your problem.



来源:https://stackoverflow.com/questions/40916482/python-spark-extract-characters-from-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!