问题
I have a list of integers and a sqlcontext dataframe with the number of rows equal to the length of the list. I want to add the list as a column to this dataframe maintaining the order. I feel like this should be really simple but I can't find an elegant solution.
回答1:
You cannot simply add a list as a dataframe column since list is local object and dataframe is distirbuted. You can try one of thw followin approaches:
- convert dataframe to local by
collect()
ortoLocalIterator()
and for each row add corresponding value from the list OR - convert list to dataframe adding an extra column (with keys from dataframe) and then join them both
来源:https://stackoverflow.com/questions/40222872/add-list-as-column-to-dataframe-in-pyspark