Read parquet data from AWS s3 bucket

我的未来我决定 提交于 2019-11-29 09:34:33
String SCHEMA_TEMPLATE = "{" +
                        "\"type\": \"record\",\n" +
                        "    \"name\": \"schema\",\n" +
                        "    \"fields\": [\n" +
                        "        {\"name\": \"timeStamp\", \"type\": \"string\"},\n" +
                        "        {\"name\": \"temperature\", \"type\": \"double\"},\n" +
                        "        {\"name\": \"pressure\", \"type\": \"double\"}\n" +
                        "    ]" +
                        "}";
String PATH_SCHEMA = "s3a";
Path internalPath = new Path(PATH_SCHEMA, bucketName, folderName);
Schema schema = new Schema.Parser().parse(SCHEMA_TEMPLATE);
Configuration configuration = new Configuration();
AvroReadSupport.setRequestedProjection(configuration, schema);
ParquetReader<GenericRecord> = AvroParquetReader.GenericRecord>builder(internalPath).withConf(configuration).build();
GenericRecord genericRecord = parquetReader.read();

while(genericRecord != null) {
        Map<String, String> valuesMap = new HashMap<>();
        genericRecord.getSchema().getFields().forEach(field -> valuesMap.put(field.name(), genericRecord.get(field.name()).toString()));

        genericRecord = parquetReader.read();
}

Gradle dependencies

    compile 'com.amazonaws:aws-java-sdk:1.11.213'
    compile 'org.apache.parquet:parquet-avro:1.9.0'
    compile 'org.apache.parquet:parquet-hadoop:1.9.0'
    compile 'org.apache.hadoop:hadoop-common:2.8.1'
    compile 'org.apache.hadoop:hadoop-aws:2.8.1'
    compile 'org.apache.hadoop:hadoop-client:2.8.1'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!