substring multiple characters from the last index of a pyspark string column using negative indexing

ぐ巨炮叔叔 提交于 2019-12-20 03:14:36

问题


Closely related to: Spark Dataframe column with last character of other column but I want to extract multiple characters from the -1 index.


I have the following pyspark dataframe df

+----------+----------+
|    number|event_type|
+----------+----------+
|0342224022|        11|
|0112964715|        11|
+----------+----------+

I want to extract 3 characters from the last index of the number column.

I tried the following:

from pyspark.sql.functions import substring 
df.select(substring(df['number'], -1, 3), 'event_type').show(2)

# which returns:

+----------------------+----------+
|substring(number,-1,3)|event_type|
+----------------------+----------+
|                     2|        11|
|                     5|        11|
+----------------------+----------+

The below is the expected output (and I'm not sure what the output above is):

+----------------------+----------+
|substring(number,-1,3)|event_type|
+----------------------+----------+
|                   022|        11|
|                   715|        11|
+----------------------+----------+

What am I doing wrong?

Note: Spark version 1.6.0


回答1:


This is how you use substring. Your position will be -3 and the length is 3.

pyspark.sql.functions.substring(str, pos, len)

You need to change your substring function call to:

from pyspark.sql.functions import substring
df.select(substring(df['number'], -3, 3), 'event_type').show(2)
#+------------------------+----------+
#|substring(number, -3, 3)|event_type|
#+------------------------+----------+
#|                     022|        11|
#|                     715|        11|
#+------------------------+----------+


来源:https://stackoverflow.com/questions/49793479/substring-multiple-characters-from-the-last-index-of-a-pyspark-string-column-usi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!