Combine PySpark DataFrame ArrayType fields into single ArrayType field

后端 未结 2 1834
既然无缘
既然无缘 2020-12-05 14:34

I have a PySpark DataFrame with 2 ArrayType fields:

>>>df
DataFrame[id: string, tokens: array, bigrams: array]
>>&         


        
2条回答
  •  旧巷少年郎
    2020-12-05 15:11

    In Spark 2.4.0 (2.3 on Databricks platform) you can do it natively in the DataFrame API using the concat function. In your example you could do this:

    from pyspark.sql.functions import col, concat
    
    df.withColumn('tokens_bigrams', concat(col('tokens'), col('bigrams')))
    

    Here is the related jira.

提交回复
热议问题