how to read a parquet file, in a standalone java code? [closed]

余生颓废 提交于 2019-12-30 02:52:47

问题


the parquet docs from cloudera shows examples of integration with pig/hive/impala. but in many cases I want to read the parquet file itself for debugging purposes.

is there a straightforward java reader api to read a parquet file ?

Thanks Yang


回答1:


You can use AvroParquetReader from parquet-avro library to read a parquet file as a set of AVRO GenericRecord objects.




回答2:


Old method: (deprecated)

AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>(file);
GenericRecord nextRecord = reader.read();

New method:

ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(file).build();
GenericRecord nextRecord = reader.read();

I got this from here and have used this in my test cases successfully.



来源:https://stackoverflow.com/questions/28615511/how-to-read-a-parquet-file-in-a-standalone-java-code

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!