Config file to define JSON Schema Structure in PySpark

后端 未结 2 1521
盖世英雄少女心
盖世英雄少女心 2020-12-07 01:55

I have created a PySpark application that reads the JSON file in a dataframe through a defined Schema. code sample below

schema = StructType([
    StructFiel         


        
2条回答
  •  执笔经年
    2020-12-07 02:32

    StructType provides json and jsonValue methods which can be used to obtain json and dict representation respectively and fromJson which can be used to convert Python dictionary to StructType.

    schema = StructType([
        StructField("domain", StringType(), True),
        StructField("timestamp", LongType(), True),                            
    ])
    
    StructType.fromJson(schema.jsonValue())
    

    The only thing you need beyond that is built-in json module to parse input to the dict that can be consumed by StructType.

    For Scala version see How to create a schema from CSV file and persist/save that schema to a file?

提交回复
热议问题