Created a nested schema in Apache Spark SQL

。_饼干妹妹 提交于 2019-12-11 05:59:54

问题


I want to load a simple JSON schema in to my SparkSession which has employee with address array . The sample JSON is below

{"firstName":"Neil","lastName":"Irani", "addresses" : [ {  "city" : "Brindavan", "state" : "NJ"  }, {  "city" : "Subala", "state" : "DT"  }]}

I'm trying to create the schema for loading my JSON, I believe there is something wrong in the below way of creating schema ... please advise .. the below code is in Java ... I could not find a reasonable sample

    List<StructField> employeeFields = new ArrayList<>();
    employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
    employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
    employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));

    List<StructField> addressFields = new ArrayList<>();
    addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
    addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
    addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));

    employeeFields.add(DataTypes.createStructField("addresses", DataTypes.createStructType(addressFields), true));

    StructType employeeSchema = DataTypes.createStructType(employeeFields);


    Dataset<Employee>  rowDataset = sparkSession.read()
            .option("inferSchema", "false")
            .schema(employeeSchema)
            .json("simple_employees.json").as(employeeEncoder);

Update

I was not creating the Array type the below code will work fine

List<StructField> employeeFields = new ArrayList<>();
employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));

List<StructField> addressFields = new ArrayList<>();
addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));
ArrayType addressStruct = DataTypes.createArrayType( DataTypes.createStructType(addressFields));

employeeFields.add(DataTypes.createStructField("addresses", addressStruct, true));
StructType employeeSchema = DataTypes.createStructType(employeeFields);

来源:https://stackoverflow.com/questions/45315784/created-a-nested-schema-in-apache-spark-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!