avro

Unable to compile and create .avro file from .avsc using Maven

烈酒焚心 提交于 2019-12-08 08:18:01
问题 I'm new to Maven and have been looking at tutorials and web for documentation on how to build a .avro from a schema file .avsc. Based on the documentation that on the apache.maven.org site. I have to add the following <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.7.5</version> </dependency> <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.7.5</version> <executions> <execution> <phase>generate-sources<

What is the value of an Avro Schema Registry?

拥有回忆 提交于 2019-12-08 07:55:16
问题 I have many microservices reading/writing Avro messages in Kafka. Schemas are great. Avro is great. But is a schema registry really needed? It helps centralize Schemas, yes, but do the microservices really need to query the registry? I don't think so. Each microservice has a copy of the schema, user.avsc , and an Avro-generated POJO: User extends SpecificRecord . I want a POJO of each Schema for easy manipulation in the code. Write to Kafka: byte [] value = user.toByteBuffer().array();

Can Apache Avro framework handle parameterized types during serialization?

醉酒当歌 提交于 2019-12-08 03:09:45
问题 Can Apache Avro handle parameterized types during serialization? I see this exception thrown from Avro framework when I try to serialize an instance that uses generics - org.apache.avro.AvroTypeException: Unknown type: T at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:255) at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:514) at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:593) at org.apache.avro.reflect.ReflectData

How to get start end and end of each avro record in a compressed avro file?

蓝咒 提交于 2019-12-08 02:20:08
问题 My problem is this. I have a snappy compressed avro file of 2GB with about 1000 avro records stored on HDFS. I know I can write code to "open up this avro file" and print out each avro record. My question is, is there a way in java to say, open up this avro file, iterate through each record and output into a text file the "start position" and "end position" of each record within that avro file such that... I could have a java function call "readRecord(startposition, endposition)" that could

Why use Avro with Kafka - How to handle POJOs

会有一股神秘感。 提交于 2019-12-07 12:56:21
问题 I have a spring application that is my kafka producer and I was wondering why avro is the best way to go. I read about it and all it has to offer, but why can't I just serialize my POJO that I created myself with jackson for example and send it to kafka? I'm saying this because the POJO generation from avro is not so straight forward. On top of it, it requires the maven plugin and an .avsc file. So for example I have a POJO on my kafka producer created myself called User: public class User {

Spark - write Avro file

Deadly 提交于 2019-12-07 07:03:33
问题 What are the common practices to write Avro files with Spark (using Scala API) in a flow like this: parse some logs files from HDFS for each log file apply some business logic and generate Avro file (or maybe merge multiple files) write Avro files to HDFS I tried to use spark-avro, but it doesn't help much. val someLogs = sc.textFile(inputPath) val rowRDD = someLogs.map { line => createRow(...) } val sqlContext = new SQLContext(sc) val dataFrame = sqlContext.createDataFrame(rowRDD, schema)

Trouble with Avro serialization of json documents missing fields

我的未来我决定 提交于 2019-12-07 02:42:39
问题 I'm trying to use Apache Avro to enforce a schema on data exported from Elastic Search into a lot of Avro documents in HDFS (to be queried with Drill). I'm having some trouble with Avro defaults Given this schema: { "namespace" : "avrotest", "type" : "record", "name" : "people", "fields" : [ {"name" : "firstname", "type" : "string"}, {"name" : "age", "type" :"int", "default": -1} ] } I'd expect that a json document such as {"firstname" : "Jane"} would be serialized using the default value of

Unable to compile and create .avro file from .avsc using Maven

泄露秘密 提交于 2019-12-07 00:41:34
I'm new to Maven and have been looking at tutorials and web for documentation on how to build a .avro from a schema file .avsc. Based on the documentation that on the apache.maven.org site. I have to add the following <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.7.5</version> </dependency> <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.7.5</version> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> <configuration> <sourceDirectory>${project.basedir}/src

How to serialize a Date using AVRO in Java

跟風遠走 提交于 2019-12-06 18:26:31
问题 I'm actually trying to serialize objects containing dates with Avro, and the deserialized date doesn't match the expected value (tested with avro 1.7.2 and 1.7.1). Here's the class I'm serializing : import java.text.SimpleDateFormat; import java.util.Date; public class Dummy { private Date date; private SimpleDateFormat df = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss.SSS"); public Dummy() { } public void setDate(Date date) { this.date = date; } public Date getDate() { return date; } @Override

How to get start end and end of each avro record in a compressed avro file?

懵懂的女人 提交于 2019-12-06 07:43:31
My problem is this. I have a snappy compressed avro file of 2GB with about 1000 avro records stored on HDFS. I know I can write code to "open up this avro file" and print out each avro record. My question is, is there a way in java to say, open up this avro file, iterate through each record and output into a text file the "start position" and "end position" of each record within that avro file such that... I could have a java function call "readRecord(startposition, endposition)" that could take the startposition and endposition to quickly read out one specific avro record without having to