List to DataFrame in pyspark

烂漫一生 提交于 2019-11-30 20:29:16

问题


Can someone tell me how to convert a list containing strings to a Dataframe in pyspark. I am using python 3.6 with spark 2.2.1. I am just started learning spark environment and my data looks like below

my_data =[['apple','ball','ballon'],['cat','camel','james'],['none','focus','cake']]

Now, i want to create a Dataframe as follows

---------------------------------
|ID | words                     |
---------------------------------
 1  | ['apple','ball','ballon'] |
 2  | ['cat','camel','james']   |

I even want to add ID column which is not associated in the data


回答1:


You can convert the list to a list of Row objects, then use spark.createDataFrame which will infer the schema from your data:

from pyspark.sql import Row
R = Row('ID', 'words')

# use enumerate to add the ID column
spark.createDataFrame([R(i, x) for i, x in enumerate(my_data)]).show() 
+---+--------------------+
| ID|               words|
+---+--------------------+
|  0|[apple, ball, bal...|
|  1| [cat, camel, james]|
|  2| [none, focus, cake]|
+---+--------------------+



回答2:


Try this -

data_array = []
for i in range (0,len(my_data)) :
    data_array.extend([(i, my_data[i])])

df = spark.createDataframe(data = data_array, schema = ["ID", "words"])

df.show()



回答3:


Try this -- the simplest approach

  from pyspark.sql import *
  x = Row(utc_timestamp=utc, routine='routine name', message='your message')
  data = [x]
  df = sqlContext.createDataFrame(data) 


来源:https://stackoverflow.com/questions/48290759/list-to-dataframe-in-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!