Get list of data types from schema in Apache Spark

倖福魔咒の 提交于 2019-12-12 07:44:04

问题


I have the following code in Spark-Python to get the list of names from the schema of a DataFrame, which works fine, but how can I get the list of the data types?

columnNames = df.schema.names

For example, something like:

columnTypes = df.schema.types

Is there any way to get a separate list of the data types contained in a DataFrame schema?


回答1:


Here's a suggestion:

df = sqlContext.createDataFrame([('a', 1)])

types = [f.dataType for f in df.schema.fields]

types
> [StringType, LongType]

Reference:

  • pyspark.sql.types.StructType
  • pyspark.sql.types.StructField



回答2:


Since the question title is not python-specific, I'll add scala version here:

val tyes = df.schema.fields.map(f => f.dataType)

It will result in an array of org.apache.spark.sql.types.DataType.




回答3:


Use schema.dtypes

scala> val df = Seq(("ABC",10,20.4)).toDF("a","b","c")
df: org.apache.spark.sql.DataFrame = [a: string, b: int ... 1 more field]

scala>

scala> df.printSchema
root
 |-- a: string (nullable = true)
 |-- b: integer (nullable = false)
 |-- c: double (nullable = false)

scala> df.dtypes
res2: Array[(String, String)] = Array((a,StringType), (b,IntegerType), (c,DoubleType))

scala> df.dtypes.map(_._2).toSet
res3: scala.collection.immutable.Set[String] = Set(StringType, IntegerType, DoubleType)

scala>


来源:https://stackoverflow.com/questions/37335307/get-list-of-data-types-from-schema-in-apache-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!