Add list as column to Dataframe in pyspark

帅比萌擦擦* 提交于 2019-12-23 01:01:09

问题


I have a list of integers and a sqlcontext dataframe with the number of rows equal to the length of the list. I want to add the list as a column to this dataframe maintaining the order. I feel like this should be really simple but I can't find an elegant solution.


回答1:


You cannot simply add a list as a dataframe column since list is local object and dataframe is distirbuted. You can try one of thw followin approaches:

  • convert dataframe to local by collect() or toLocalIterator() and for each row add corresponding value from the list OR
  • convert list to dataframe adding an extra column (with keys from dataframe) and then join them both


来源:https://stackoverflow.com/questions/40222872/add-list-as-column-to-dataframe-in-pyspark

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!