Reading a simple Avro file from HDFS

拈花ヽ惹草 提交于 2019-12-03 14:37:49

问题


I am trying to do a simple read of an Avro file stored in HDFS. I found out how to read it when it is on the local file system....

FileReader reader = DataFileReader.openReader(new File(filename), new GenericDatumReader());

for (GenericRecord datum : fileReader) {
   String value = datum.get(1).toString();
   System.out.println("value = " value);
}

reader.close();

My file is in HDFS, however. I cannot give the openReader a Path or an FSDataInputStream. How can I simply read an Avro file in HDFS?

EDIT: I got this to work by creating a custom class (SeekableHadoopInput) that implements SeekableInput. I "stole" this from "Ganglion" on github. Still, seems like there would be a Hadoop/Avro integration path for this.

Thanks


回答1:


The FsInput class (in the avro-mapred submodule, since it depends on Hadoop) can do this. It provides the seekable input stream that is needed for Avro data files.

Path path = new Path("/path/on/hdfs");
Configuration config = new Configuration(); // make this your Hadoop env config
SeekableInput input = new FsInput(path, config);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader);

for (GenericRecord datum : fileReader) {
    System.out.println("value = " + datum);
}

fileReader.close(); // also closes underlying FsInput


来源:https://stackoverflow.com/questions/11632067/reading-a-simple-avro-file-from-hdfs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!