parse pyspark column values into new columns

谁说我不能喝 提交于 2021-01-29 11:31:40

问题


I have a pyspark dataframe like the example df below. It has 3 columns in it organization_id, id, query_builder. The query_builder column contains a string that's similar to a nested dict. I would like to parse the query_builder field into separate columns for the field, operator, and value. I've supplied an example desired output below. If need be I could convert the pyspark dataframe to a pandas dataframe to make it easier. Does anyone have suggestions or recognize the type of data in the query_builder column? For example is it json?

code:

df[['organization_id','id','query_builder']].show(n=2,truncate=False)

output:

+---------------+---+--------------------------------------------+
|organization_id|id |query_builder                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+---------------+---+--------------------------------------------+
|16             |60 |---
- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
  name: Price Unit
  field: priceunit
  type: number
  operator: between
  value:
  - '40'
  - '60'
- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
  name: Product Type
  field: producttype
  type: string
  operator: in
  value:
  - FLOWER
     |
|11             |55 |---
- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
  name: Price Unit
  field: priceunit
  type: number
  operator: between
  value:
  - 
  - 
- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
  name: Product Type
  field: producttype
  type: string
  operator: in
  value:
  - CANDY
|
+---------------+---+--------------------------------------------+

desired output:

organization_id id field       operator value
60              16 priceunit   between  [40,60]
60              16 producttype in       ['FLOWER']
11              55 priceunit   between  []
11              55 producttype in       ['CANDY']

来源:https://stackoverflow.com/questions/64267851/parse-pyspark-column-values-into-new-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!