Create spark dataframe schema from json schema representation

后端 未结 2 493
不思量自难忘°
不思量自难忘° 2020-12-04 16:40

Is there a way to serialize a dataframe schema to json and deserialize it later on?

The use case is simple: I have a json configuration file which contains the sche

相关标签:
2条回答
  • 2020-12-04 17:29

    I am posting a pyspark version to a question answered by Assaf:

    from pyspark.sql.types import StructType    
    
    # Save schema from the original DataFrame into json:
    schema_json = df.schema.json()
    
    # Restore schema from json:
    import json
    new_schema = StructType.fromJson(json.loads(schema_json))
    
    0 讨论(0)
  • 2020-12-04 17:35

    There are two steps for this: Creating the json from an existing dataframe and creating the schema from the previously saved json string.

    Creating the string from an existing dataframe

        val schema = df.schema
        val jsonString = schema.json
    

    create a schema from json

        import org.apache.spark.sql.types.{DataType, StructType}
        val newSchema = DataType.fromJson(jsonString).asInstanceOf[StructType]
    
    0 讨论(0)
提交回复
热议问题