问题
I want to create a Dataframe in PySpark with the following code
from pyspark.sql import *
from pyspark.sql.types import *
temp = Row("DESC", "ID")
temp1 = temp('Description1323', 123)
print temp1
schema = StructType([StructField("DESC", StringType(), False),
StructField("ID", IntegerType(), False)])
df = spark.createDataFrame(temp1, schema)
But i am receiving the following error:
TypeError: StructType can not accept object 'Description1323' in type type 'str'
Whats wrong with my code?
回答1:
The problem is that you are passing a Row
where you should be passing a list of Row
s. Try this:
from pyspark.sql import *
from pyspark.sql.types import *
temp = Row("DESC", "ID")
temp1 = temp('Description1323', 123)
print temp1
schema = StructType([StructField("DESC", StringType(), False),
StructField("ID", IntegerType(), False)])
df = spark.createDataFrame([temp1], schema)
df.show()
And the result:
+---------------+---+
| DESC| ID|
+---------------+---+
|Description1323|123|
+---------------+---+
来源:https://stackoverflow.com/questions/52586199/cannot-create-dataframe-in-pyspark