Write pojo's to parquet file using reflection

前端 未结 3 732
感情败类
感情败类 2021-01-03 05:23

HI Looking for APIs to write parquest with Pojos that I have. I was able to generate avro schema using reflection and then create parquet schema using AvroSchemaConverter.

3条回答
  •  情书的邮戳
    2021-01-03 06:01

    DISCLAIMER: The following code was written when I was in a hurry. It is not efficient and future versions of parquet will surely fix this more directly. That being said, this is a lightweight inefficient approach to what you need. The strategy is POJO -> AVRO -> PARQUET

    1. POJO -> AVRO: Declare a schema via reflection. Declare writers and readers based on the schema. At the time of conversion write the object to byte stream and read it back as avro.
    2. AVRO -> Parquet: use the AvroParquetWriter included in the parquet-me project.

    private static final Schema avroSchema = ReflectData.AllowNull.get().getSchema(YOURCLASS.class);
    private static final ReflectDatumWriter reflectDatumWriter = new ReflectDatumWriter<>(avroSchema);
    private static final GenericDatumReader genericRecordReader = new GenericDatumReader<>(avroSchema);
    
    public GenericRecord toAvroGenericRecord() throws IOException {
        ByteArrayOutputStream bytes = new ByteArrayOutputStream();
        reflectDatumWriter.write(this, EncoderFactory.get().directBinaryEncoder(bytes, null));
        return (GenericRecord) genericRecordReader.read(null, DecoderFactory.get().binaryDecoder(bytes.toByteArray(), null));
    }
    
    
    

    One more thing: it seems the parquet writers are currently very strict about null fields. Make sure none of your fields are null before attempting to write to parquet

    提交回复
    热议问题