问题
the parquet docs from cloudera shows examples of integration with pig/hive/impala. but in many cases I want to read the parquet file itself for debugging purposes.
is there a straightforward java reader api to read a parquet file ?
Thanks Yang
回答1:
You can use AvroParquetReader
from parquet-avro library to read a parquet file as a set of AVRO GenericRecord
objects.
回答2:
Old method: (deprecated)
AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>(file);
GenericRecord nextRecord = reader.read();
New method:
ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(file).build();
GenericRecord nextRecord = reader.read();
I got this from here and have used this in my test cases successfully.
来源:https://stackoverflow.com/questions/28615511/how-to-read-a-parquet-file-in-a-standalone-java-code