avro

Generate Avro Schema from certain Java Object

本小妞迷上赌 提交于 2019-12-03 01:28:50
Apache Avro provides a compact, fast, binary data format, rich data structure for serialization. However, it requires user to define a schema (in JSON) for object which need to be serialized. In some case, this can not be possible (e.g: the class of that Java object has some members whose types are external java classes in external libraries). Hence, I wonder there is a tool can get the information from object's .class file and generate the Avro schema for that object (like Gson use object's .class information to convert certain object to JSON string). MoustafaAAtta Take a look at the Java

Polymorphism and inheritance in Avro schemas

坚强是说给别人听的谎言 提交于 2019-12-02 20:26:56
Is it possible to write an Avro schema/IDL that will generate a Java class that either extends a base class or implements an interface? It seems like the generated Java class extends the org.apache.avro.specific.SpecificRecordBase . So, the implements might be the way to go. But, I don't know if this is possible. I have seen examples with suggestions to define an explicit "type" field in each specific schema, with more of an association than inheritance semantics. I use my base class heavily in my factory classes and other parts of the code with generics like <T extends BaseObject> . Currently

Get a typed value from an Avro GenericRecord

╄→гoц情女王★ 提交于 2019-12-02 19:50:06
Given a GenericRecord , what is the recommended way to retrieve a typed value, as opposed to an Object? Are we expected to cast the values, and if so what is the mapping from Avro types to Java types? For example, Avro Array == Java Collection ; and Avro String == Java Utf8 . Since every GenericRecord contains its schema, I was hoping for a type-safe way to retrieve values. Avro has eight primitive types and five complex types (excluding unions which are a combination of other types). The following table maps these 13 Avro types to their input interfaces (the Java types which can be put into a

How can I load Avros in Spark using the schema on-board the Avro file(s)?

佐手、 提交于 2019-12-02 19:37:48
I am running CDH 4.4 with Spark 0.9.0 from a Cloudera parcel. I have a bunch of Avro files that were created via Pig's AvroStorage UDF. I want to load these files in Spark, using a generic record or the schema onboard the Avro files. So far I've tried this: import org.apache.avro.mapred.AvroKey import org.apache.avro.mapreduce.AvroKeyInputFormat import org.apache.hadoop.io.NullWritable import org.apache.commons.lang.StringEscapeUtils.escapeCsv import org.apache.hadoop.fs.Path import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.conf.Configuration import java.net.URI import java.io

Apache Kafka with Avro and Schema Repo - where in the message does the schema Id go?

巧了我就是萌 提交于 2019-12-02 15:55:37
I want to use Avro to serialize the data for my Kafka messages and would like to use it with an Avro schema repository so I don't have to include the schema with every message. Using Avro with Kafka seems like a popular thing to do, and lots of blogs / Stack Overflow questions / usergroups etc reference sending the Schema Id with the message but I cannot find an actual example of where it should go. I think it should go in the Kafka message header somewhere but I cannot find an obvious place. If it was in the Avro message you would have to decode it against a schema to get the message contents

Apache Kafka JDBC Connector - SerializationException: Unknown magic byte

余生颓废 提交于 2019-12-02 11:34:05
We are trying to write back the values from a topic to a postgres database using the Confluent JDBC Sink Connector. connector.class=io.confluent.connect.jdbc.JdbcSinkConnector connection.password=xxx tasks.max=1 topics=topic_name auto.evolve=true connection.user=confluent_rw auto.create=true connection.url=jdbc:postgresql://x.x.x.x:5432/Datawarehouse value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 We can read the value

Apache Kafka JDBC Connector - SerializationException: Unknown magic byte

可紊 提交于 2019-12-02 10:45:00
问题 We are trying to write back the values from a topic to a postgres database using the Confluent JDBC Sink Connector. connector.class=io.confluent.connect.jdbc.JdbcSinkConnector connection.password=xxx tasks.max=1 topics=topic_name auto.evolve=true connection.user=confluent_rw auto.create=true connection.url=jdbc:postgresql://x.x.x.x:5432/Datawarehouse value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 key.converter=io.confluent

avro error on AWS EMR

心不动则不痛 提交于 2019-12-02 07:54:11
问题 I'm using spark-redshift (https://github.com/databricks/spark-redshift) which uses avro for transfer. Reading from Redshift is OK, while writing I'm getting Caused by: java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter tried using Amazon EMR 4.1.0 (Spark 1.5.0) and 4.0.0 (Spark 1.4.1). Cannot do import org.apache.avro.generic.GenericData.createDatumWriter either, just import org.apache.avro.generic

avro error on AWS EMR

蹲街弑〆低调 提交于 2019-12-02 04:11:43
I'm using spark-redshift ( https://github.com/databricks/spark-redshift ) which uses avro for transfer. Reading from Redshift is OK, while writing I'm getting Caused by: java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter tried using Amazon EMR 4.1.0 (Spark 1.5.0) and 4.0.0 (Spark 1.4.1). Cannot do import org.apache.avro.generic.GenericData.createDatumWriter either, just import org.apache.avro.generic.GenericData I'm using scala shell Tried download several others avro-mapred and avro jars, tried setting {

Avro: How can I use default fields when I don't know the exact schema that the “writer” used

允我心安 提交于 2019-12-02 03:56:09
问题 In Java Avro, how do I parse data1 , data2 and data3 below to a GenericRecord . //Schema { "type": "record", "name": "user", "fields": [ {"name": "name", "type": "string"}, {"name": "colour", "type": "string", "default": "green"}, {"name": "mass", "type": "int", "default": 100} ] } //data 1 {"name":"Sean"} //data 2 {"name":"Sean", "colour":"red"} //data 3 {"name":"Sean", "colour":"red", "mass":200} I've seen some discussion on schema evolution etc, and the ability pass a writer's schema and a