PySpark: Map a SchemaRDD into a SchemaRDD

后端 未结 4 923
猫巷女王i
猫巷女王i 2021-01-07 10:21

I am loading a file of JSON objects as a PySpark SchemaRDD. I want to change the \"shape\" of the objects (basically, I\'m flattening them) and then insert into

4条回答
  •  粉色の甜心
    2021-01-07 10:57

    The solution is applySchema:

    mapped = log_json.map(flatten_function)
    hive_context.applySchema(mapped, flat_schema).insertInto(name)
    

    Where flat_schema is a StructType representing the schema in the same way as you would obtain from log_json.schema() (but flattened, obviously).

提交回复
热议问题