How to create DataFrame Schema from Json schem file

百般思念 提交于 2019-12-23 05:19:06

问题


My use case is to read an existing json-schema file, parse this json-schema file and build a Spark DataFrame schema out of it. To start off I followed the steps mentioned here.

Steps followed
1.Imported the library from Maven
2.Restarted the cluster
3.Created a sample JSON schema file
4.Used this code to read the sample schema file
val schema = SchemaConverter.convert("/FileStore/tables/schemaFile.json")

When I run above command I get error: not found: value SchemaConverter

To ensure that the library is being called I reattached the notebook to cluster after restarting the cluster.

In addition to trying out the above method, I tried the below as well. I replaced jsonString with the actual JSON schema.

import org.apache.spark.sql.types.{DataType, StructType} val newSchema = DataType.fromJson(jsonString).asInstanceOf[StructType]

the sample Schema I've been playing with has 300+feilds, for simplicity, I used the sample schema from here.


回答1:


SchemaConverter works for me. I used spark-shell to test and installed required package as spark-shell --packages "org.zalando:spark-json-schema_2.11:0.6.1".

scala> val schema = SchemaConverter.convertContent("""
 | {
 |   "$schema": "http://json-schema.org/draft-04/schema#",
 |   "title": "Product",
 |   "description": "A product from Acme's catalog",
 |   "type": "object",
 |   "properties": {
 |     "id": {
 |       "description": "The unique identifier for a product",
 |       "type": "integer"
 |     },
 |     "name": {
 |       "description": "Name of the product",
 |       "type": "string"
 |     },
 |     "price": {
 |       "type": "number",
 |       "minimum": 0,
 |       "exclusiveMinimum": true
 |     }
 |   },
 |   "required": [
 |     "id",
 |     "name",
 |     "price"
 |   ]
 | }
 | """)

schema: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false), StructField(name,StringType,false), StructField(price,DoubleType,false))

scala> schema.toString
res1: String = StructType(StructField(id,LongType,false), StructField(name,StringType,false), StructField(price,DoubleType,false))

Do you want to explicitly specify schema while reading json data?, because if you read json data using spark, it automatically infers schema from json data. eg.

val df = spark.read.json("json-file")
df.printSchema() // Gives schema of json data


来源:https://stackoverflow.com/questions/57417472/how-to-create-dataframe-schema-from-json-schem-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!