How to create a sample Spark dataFrame in Python?

为君一笑 提交于 2020-01-22 14:18:38

问题


I want to create a sample DataFrame but the following code is not working:

df = spark.createDataFrame(["10","11","13"], ("age"))

## ValueError
## ...
## ValueError: Could not parse datatype: age

Expected result is:

age
10
11
13

回答1:


the following code is not working

With single element you need a schema as type

spark.createDataFrame(["10","11","13"], "string").toDF("age")

or DataType:

from pyspark.sql.types import StringType

spark.createDataFrame(["10","11","13"], StringType()).toDF("age")

With name elements should be tuples and schema as sequence:

spark.createDataFrame([("10", ), ("11", ), ("13",  )], ["age"])



回答2:


Well .. There is some pretty easy method for creating sample dataframe in PySpark

>>> df = sc.parallelize([[1,2,3], [2,3,4]]).toDF()
>>> df.show()
+---+---+---+
| _1| _2| _3|
+---+---+---+
|  1|  2|  3|
|  2|  3|  4|
+---+---+---+

to create with some column names

>>> df1 = sc.parallelize([[1,2,3], [2,3,4]]).toDF(("a", "b", "c"))
>>> df1.show()
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  2|  3|
|  2|  3|  4|
+---+---+---+

In this way, no need to define schema too.Hope this is the simplest way




回答3:


I used just spark.read to create a dataframe in python, as stated in the documentation, save your data into as a json for example and load it like this:

df = spark.read.json("examples/src/main/resources/people.json")

Hope this helps!




回答4:


from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([{"a": "x", "b": "y", "c": "3"}])

Output: (no need to define schema)

+---+---+---+
| a | b | c |
+---+---+---+
|  x|  y|  3|
+---+---+---+


来源:https://stackoverflow.com/questions/47674311/how-to-create-a-sample-spark-dataframe-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!