问题
My use case is to read an existing json-schema file, parse this json-schema file and build a Spark DataFrame schema out of it. To start off I followed the steps mentioned here.
Steps followed
1.Imported the library from Maven
2.Restarted the cluster
3.Created a sample JSON schema file
4.Used this code to read the sample schema fileval schema = SchemaConverter.convert("/FileStore/tables/schemaFile.json")
When I run above command I get error: not found: value SchemaConverter
To ensure that the library is being called I reattached the notebook to cluster after restarting the cluster.
In addition to trying out the above method, I tried the below as well. I replaced jsonString with the actual JSON schema.
import org.apache.spark.sql.types.{DataType, StructType}
val newSchema = DataType.fromJson(jsonString).asInstanceOf[StructType]
the sample Schema I've been playing with has 300+feilds, for simplicity, I used the sample schema from here.
回答1:
SchemaConverter
works for me. I used spark-shell
to test and installed required package as spark-shell --packages "org.zalando:spark-json-schema_2.11:0.6.1"
.
scala> val schema = SchemaConverter.convertContent("""
| {
| "$schema": "http://json-schema.org/draft-04/schema#",
| "title": "Product",
| "description": "A product from Acme's catalog",
| "type": "object",
| "properties": {
| "id": {
| "description": "The unique identifier for a product",
| "type": "integer"
| },
| "name": {
| "description": "Name of the product",
| "type": "string"
| },
| "price": {
| "type": "number",
| "minimum": 0,
| "exclusiveMinimum": true
| }
| },
| "required": [
| "id",
| "name",
| "price"
| ]
| }
| """)
schema: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false), StructField(name,StringType,false), StructField(price,DoubleType,false))
scala> schema.toString
res1: String = StructType(StructField(id,LongType,false), StructField(name,StringType,false), StructField(price,DoubleType,false))
Do you want to explicitly specify schema while reading json data?, because if you read json data using spark, it automatically infers schema from json data. eg.
val df = spark.read.json("json-file")
df.printSchema() // Gives schema of json data
来源:https://stackoverflow.com/questions/57417472/how-to-create-dataframe-schema-from-json-schem-file