avro | 易学教程

Does binary encoding of AVRO compress data?

阅读更多关于 Does binary encoding of AVRO compress data?

In one of our projects we are using Kafka with AVRO to transfer data across applications. Data is added to an AVRO object and object is binary encoded to write to Kafka. We use binary encoding as it is generally mentioned as a minimal representation compared to other formats. The data is usually a JSON string and when it is saved in a file, it uses up to 10 Mb of disk. However, when the file is compressed (.zip), it uses only few KBs. We are concerned storing such data in Kafka, so trying to compress before writing to a Kafka topic. When length of binary encoded message (i.e. length of byte

Hive create table with inputs from nested sub-directories

阅读更多关于 Hive create table with inputs from nested sub-directories

问题 I have data in Avro format in HDFS in file paths like: /data/logs/[foldername]/[filename].avro . I want to create a Hive table over all these log files, i.e. all files of the form /data/logs/*/* . (They're all based on the same Avro schema.) I'm running the below query with flag mapred.input.dir.recursive=true : CREATE EXTERNAL TABLE default.testtable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro

Use schema to convert AVRO messages with Spark to DataFrame

阅读更多关于 Use schema to convert AVRO messages with Spark to DataFrame

问题 Is there a way to use a schema to convert avro messages from kafka with spark to dataframe? The schema file for user records: { "fields": [ { "name": "firstName", "type": "string" }, { "name": "lastName", "type": "string" } ], "name": "user", "type": "record" } And code snippets from SqlNetworkWordCount example and Kafka, Spark and Avro - Part 3, Producing and consuming Avro messages to read in messages. object Injection { val parser = new Schema.Parser() val schema = parser.parse(getClass

Reading a simple Avro file from HDFS

阅读更多关于 Reading a simple Avro file from HDFS

问题 I am trying to do a simple read of an Avro file stored in HDFS. I found out how to read it when it is on the local file system.... FileReader reader = DataFileReader.openReader(new File(filename), new GenericDatumReader()); for (GenericRecord datum : fileReader) { String value = datum.get(1).toString(); System.out.println("value = " value); } reader.close(); My file is in HDFS, however. I cannot give the openReader a Path or an FSDataInputStream. How can I simply read an Avro file in HDFS?

How to avro binary encode my json string to a byte array?

阅读更多关于 How to avro binary encode my json string to a byte array?

I have a actual JSON String which I need to avro binary encode to a byte array. After going through the Apache Avro specification , I came up with the below code. I am not sure whether this is the right way to do it or not. Can anyone take a look whether the way I am trying to avro binary encode my JSON String is correct or not?. I am using Apache Avro 1.7.7 version. public class AvroTest { private static final String json = "{" + "\"name\":\"Frank\"," + "\"age\":47" + "}"; private static final String schema = "{ \"type\":\"record\", \"namespace\":\"foo\", \"name\":\"Person\", \"fields\":[ { \

Generate Avro Schema from certain Java Object

阅读更多关于 Generate Avro Schema from certain Java Object

问题 Apache Avro provides a compact, fast, binary data format, rich data structure for serialization. However, it requires user to define a schema (in JSON) for object which need to be serialized. In some case, this can not be possible (e.g: the class of that Java object has some members whose types are external java classes in external libraries). Hence, I wonder there is a tool can get the information from object's .class file and generate the Avro schema for that object (like Gson use object's

Polymorphism and inheritance in Avro schemas

阅读更多关于 Polymorphism and inheritance in Avro schemas

问题 Is it possible to write an Avro schema/IDL that will generate a Java class that either extends a base class or implements an interface? It seems like the generated Java class extends the org.apache.avro.specific.SpecificRecordBase . So, the implements might be the way to go. But, I don't know if this is possible. I have seen examples with suggestions to define an explicit "type" field in each specific schema, with more of an association than inheritance semantics. I use my base class heavily

Use schema to convert AVRO messages with Spark to DataFrame

阅读更多关于 Use schema to convert AVRO messages with Spark to DataFrame

Is there a way to use a schema to convert avro messages from kafka with spark to dataframe ? The schema file for user records: { "fields": [ { "name": "firstName", "type": "string" }, { "name": "lastName", "type": "string" } ], "name": "user", "type": "record" } And code snippets from SqlNetworkWordCount example and Kafka, Spark and Avro - Part 3, Producing and consuming Avro messages to read in messages. object Injection { val parser = new Schema.Parser() val schema = parser.parse(getClass.getResourceAsStream("/user_schema.json")) val injection: Injection[GenericRecord, Array[Byte]] =

Avro with Java 8 dates as logical type

阅读更多关于 Avro with Java 8 dates as logical type

Latest Avro compiler (1.8.2) generates java sources for dates logical types with Joda-Time based implementations. How can I configure Avro compiler to produce sources that used Java 8 date-time API? Currently (avro 1.8.2) this is not possible. It's hardcoded to generate Joda date/time classes. The current master branch has switched to Java 8 and there is an open issue (with Pull Request ) to add the ability to generate classes with java.time.* types. I have no idea on any kind of release schedule for whatever is currently in master unfortunately. If you feel adventurous you can apply the patch

Reading a simple Avro file from HDFS

阅读更多关于 Reading a simple Avro file from HDFS

I am trying to do a simple read of an Avro file stored in HDFS. I found out how to read it when it is on the local file system.... FileReader reader = DataFileReader.openReader(new File(filename), new GenericDatumReader()); for (GenericRecord datum : fileReader) { String value = datum.get(1).toString(); System.out.println("value = " value); } reader.close(); My file is in HDFS, however. I cannot give the openReader a Path or an FSDataInputStream. How can I simply read an Avro file in HDFS? EDIT: I got this to work by creating a custom class (SeekableHadoopInput) that implements SeekableInput.