I am loading a file of JSON objects as a PySpark SchemaRDD. I want to change the \"shape\" of the objects (basically, I\'m flattening them) and then insert into
SchemaRDD
The solution is applySchema:
applySchema
mapped = log_json.map(flatten_function) hive_context.applySchema(mapped, flat_schema).insertInto(name)
Where flat_schema is a StructType representing the schema in the same way as you would obtain from log_json.schema() (but flattened, obviously).
StructType
log_json.schema()