问题
I have a pyspark dataframe like the example df below. It has 3 columns in it organization_id, id, query_builder. The query_builder column contains a string that's similar to a nested dict. I would like to parse the query_builder field into separate columns for the field, operator, and value. I've supplied an example desired output below. If need be I could convert the pyspark dataframe to a pandas dataframe to make it easier. Does anyone have suggestions or recognize the type of data in the query_builder column? For example is it json?
code:
df[['organization_id','id','query_builder']].show(n=2,truncate=False)
output:
+---------------+---+--------------------------------------------+
|organization_id|id |query_builder |
+---------------+---+--------------------------------------------+
|16 |60 |---
- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
name: Price Unit
field: priceunit
type: number
operator: between
value:
- '40'
- '60'
- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
name: Product Type
field: producttype
type: string
operator: in
value:
- FLOWER
|
|11 |55 |---
- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
name: Price Unit
field: priceunit
type: number
operator: between
value:
-
-
- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
name: Product Type
field: producttype
type: string
operator: in
value:
- CANDY
|
+---------------+---+--------------------------------------------+
desired output:
organization_id id field operator value
60 16 priceunit between [40,60]
60 16 producttype in ['FLOWER']
11 55 priceunit between []
11 55 producttype in ['CANDY']
来源:https://stackoverflow.com/questions/64267851/parse-pyspark-column-values-into-new-columns