How can missing columns be added as null while read a nested JSON using pyspark and a predefined struct schema
问题 Python=3.6 Spark=2.4 My sample JSON data: {"data":{"header":"someheader","body":{"name":"somename","value":"somevalue","books":[{"name":"somename"},{"value":"somevalue"},{"author":"someauthor"}]}}}, {"data":{"header":"someheader1","body":{"name":"somename1","value":"somevalue1","books":[{"name":"somename1"},{"value":"somevalue1"},{"author":"someauthor1"}]}}},.... My Struct Schema: Schema = StructType([StructField('header',StringType(),True),StructField('body',StructType([StructField('name1'