Cannot create Dataframe in PySpark

夙愿已清 提交于 2020-01-02 09:40:12

问题


I want to create a Dataframe in PySpark with the following code

from pyspark.sql import *
from pyspark.sql.types import *

temp = Row("DESC", "ID")
temp1 = temp('Description1323', 123)

print temp1

schema = StructType([StructField("DESC", StringType(), False),
                     StructField("ID", IntegerType(), False)])

df = spark.createDataFrame(temp1, schema)

But i am receiving the following error:

TypeError: StructType can not accept object 'Description1323' in type type 'str'

Whats wrong with my code?


回答1:


The problem is that you are passing a Row where you should be passing a list of Rows. Try this:

from pyspark.sql import *
from pyspark.sql.types import *

temp = Row("DESC", "ID")
temp1 = temp('Description1323', 123)

print temp1

schema = StructType([StructField("DESC", StringType(), False),
                     StructField("ID", IntegerType(), False)])

df = spark.createDataFrame([temp1], schema)

df.show()

And the result:

+---------------+---+
|           DESC| ID|
+---------------+---+
|Description1323|123|
+---------------+---+


来源:https://stackoverflow.com/questions/52586199/cannot-create-dataframe-in-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!