How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?

后端未结

关注

 1  1370

I have as input a set of files formatted as a single JSON object per line. The problem, however, is that one field on these JSON objects is a JSON-escaped String. Example

相关标签:

1条回答

花落未央

2020-12-21 23:44

Would that be acceptable solution?

val sc: SparkContext = ...
val sqlContext = new SQLContext(sc)

val escapedJsons: RDD[String] = sc.parallelize(Seq("""{"id":1,"name":"some name","problem_field":"{\"height\":180,\"weight\":80}"}"""))
val unescapedJsons: RDD[String] = escapedJsons.map(_.replace("\"{", "{").replace("\"}", "}").replace("\\\"", "\""))
val dfJsons: DataFrame = sqlContext.read.json(unescapedJsons)

dfJsons.printSchema()

// Output
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
|-- problem_field: struct (nullable = true)
|    |-- height: long (nullable = true)
|    |-- weight: long (nullable = true)

0 讨论(0)