PySpark: Map a SchemaRDD into a SchemaRDD

后端未结

关注

 4  938

猫巷女王i 2021-01-07 10:21

I am loading a file of JSON objects as a PySpark SchemaRDD. I want to change the \"shape\" of the objects (basically, I\'m flattening them) and then insert into

4条回答

一个人的身影 (楼主)

2021-01-07 11:18
It looks like select is not available in python, so you will have to registerTempTable and write it as a SQL statement, like
```
`SELECT flatten(*) FROM TABLE`
```
after setting up the function for use in SQL
```
sqlCtx.registerFunction("flatten", lambda x: flatten_function(x))
```
As @zero323 brought up, a function against * is probably not supported...so you can just create a function that takes in your data types and pass all of that in.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...