How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?

后端 未结 1 1370
不思量自难忘°
不思量自难忘° 2020-12-21 23:05

I have as input a set of files formatted as a single JSON object per line. The problem, however, is that one field on these JSON objects is a JSON-escaped String. Example

相关标签:
1条回答
  • 2020-12-21 23:44

    Would that be acceptable solution?

    val sc: SparkContext = ...
    val sqlContext = new SQLContext(sc)
    
    val escapedJsons: RDD[String] = sc.parallelize(Seq("""{"id":1,"name":"some name","problem_field":"{\"height\":180,\"weight\":80}"}"""))
    val unescapedJsons: RDD[String] = escapedJsons.map(_.replace("\"{", "{").replace("\"}", "}").replace("\\\"", "\""))
    val dfJsons: DataFrame = sqlContext.read.json(unescapedJsons)
    
    dfJsons.printSchema()
    
    // Output
    root
    |-- id: long (nullable = true)
    |-- name: string (nullable = true)
    |-- problem_field: struct (nullable = true)
    |    |-- height: long (nullable = true)
    |    |-- weight: long (nullable = true)
    
    0 讨论(0)
提交回复
热议问题