create parquet files in java

前端 未结 2 824
面向向阳花
面向向阳花 2020-12-15 22:48

Is there a way to create parquet files from java?

I have data in memory (java classes) and I want to write it into a parquet file, to later read it from apache-drill

2条回答
  •  甜味超标
    2020-12-15 23:34

    ParquetWriter's constructors are deprecated(1.8.1) but not ParquetWriter itself, you can still create ParquetWriter by extending abstract Builder subclass inside of it.

    Here an example from parquet creators themselves ExampleParquetWriter:

      public static class Builder extends ParquetWriter.Builder {
        private MessageType type = null;
        private Map extraMetaData = new HashMap();
    
        private Builder(Path file) {
          super(file);
        }
    
        public Builder withType(MessageType type) {
          this.type = type;
          return this;
        }
    
        public Builder withExtraMetaData(Map extraMetaData) {
          this.extraMetaData = extraMetaData;
          return this;
        }
    
        @Override
        protected Builder self() {
          return this;
        }
    
        @Override
        protected WriteSupport getWriteSupport(Configuration conf) {
          return new GroupWriteSupport(type, extraMetaData);
        }
    
      }
    

    If you don't want to use Group and GroupWriteSupport(bundled in Parquet but purposed just as an example of data-model implementation) you can go with Avro, Protocol Buffers, or Thrift in-memory data models. Here is an example using writing Parquet using Avro:

    try (ParquetWriter writer = AvroParquetWriter
            .builder(fileToWrite)
            .withSchema(schema)
            .withConf(new Configuration())
            .withCompressionCodec(CompressionCodecName.SNAPPY)
            .build()) {
        for (GenericData.Record record : recordsToWrite) {
            writer.write(record);
        }
    }   
    

    You will need these dependencies:

    
        org.apache.parquet
        parquet-avro
        1.8.1
    
    
    
        org.apache.parquet
        parquet-hadoop
        1.8.1
    
    

    Full example here.

提交回复
热议问题