Pyspark Unsupported literal type class java.util.ArrayList [duplicate]

十年热恋 提交于 2019-12-08 12:38:54

问题


I am using python3 on Spark(2.2.0). I want to apply my UDF to a specified list of strings.

df = ['Apps A','Chrome', 'BBM', 'Apps B', 'Skype']

def calc_app(app, app_list):

    browser_list = ['Chrome', 'Firefox', 'Opera']
    chat_list = ['WhatsApp', 'BBM', 'Skype']
    sum = 0
    for data in app:
        name = data['name']
        if name in app_list:
            sum += 1
    return sum

calc_appUDF = udf(calc_app)
df = df.withColumn('app_browser', calc_appUDF(df['apps'], browser_list))
df = df.withColumn('app_chat', calc_appUDF(df['apps'], chat_list))

But it failed and returns : 'Unsupported literal type class java.util.ArrayList'


回答1:


If I understood your requirement correctly then you should try this

from pyspark.sql.functions import udf, col

#sample data
df_list = ['Apps A','Chrome', 'BBM', 'Apps B', 'Skype']
df = sqlContext.createDataFrame([(l,) for l in df_list], ['apps'])
df.show()

#some lists definition
browser_list = ['Chrome', 'Firefox', 'Opera']
chat_list = ['WhatsApp', 'BBM', 'Skype']

#udf definition    
def calc_app(app, app_list):
    if app in app_list:
        return 1
    else:
        return 0
def calc_appUDF(app_list):
    return udf(lambda l: calc_app(l, app_list))

#add new columns
df = df.withColumn('app_browser', calc_appUDF(browser_list)(col('apps')))
df = df.withColumn('app_chat', calc_appUDF(chat_list)(col('apps')))
df.show()

Sample input:

+------+
|  apps|
+------+
|Apps A|
|Chrome|
|   BBM|
|Apps B|
| Skype|
+------+

Output is:

+------+-----------+--------+
|  apps|app_browser|app_chat|
+------+-----------+--------+
|Apps A|          0|       0|
|Chrome|          1|       0|
|   BBM|          0|       1|
|Apps B|          0|       0|
| Skype|          0|       1|
+------+-----------+--------+


来源:https://stackoverflow.com/questions/48242212/pyspark-unsupported-literal-type-class-java-util-arraylist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!