PySpark: Map a SchemaRDD into a SchemaRDD

后端 未结 4 938
猫巷女王i
猫巷女王i 2021-01-07 10:21

I am loading a file of JSON objects as a PySpark SchemaRDD. I want to change the \"shape\" of the objects (basically, I\'m flattening them) and then insert into

4条回答
  •  一个人的身影
    2021-01-07 11:18

    It looks like select is not available in python, so you will have to registerTempTable and write it as a SQL statement, like

    `SELECT flatten(*) FROM TABLE`
    

    after setting up the function for use in SQL

    sqlCtx.registerFunction("flatten", lambda x: flatten_function(x))
    

    As @zero323 brought up, a function against * is probably not supported...so you can just create a function that takes in your data types and pass all of that in.

提交回复
热议问题