avro

Deserialization PRIMITIVE AVRO KEY in KStream APP

瘦欲@ 提交于 2019-12-11 18:46:37
问题 I'm currently incapable of deserialize an avro PRIMITIVE key in a KSTREAM APP the key in encoded with an avro schema ( registered in the schema registry ) , when i use the kafka-avro-console-consumer, I can see that the key is correctly deserialize But impossible to make it work in a KSTREAM app the avro schema of the key is a PRIMITIVE: {"type":"string"} I already followed the documentation of confluent final Serde<V> valueSpecificAvroSerde = new SpecificAvroSerde<>(); final Map<String,

Trouble with deserializing Avro data in Scala

若如初见. 提交于 2019-12-11 18:06:16
问题 I am building an Apache Flink application in Scala which reads streaming data from a Kafka bus and then performs summarizing operations on it. The data from Kafka is in Avro format and needs a special Deserialization class. I found this scala class AvroDeserializationScehema (http://codegists.com/snippet/scala/avrodeserializationschemascala_saveveltri_scala): package org.myorg.quickstart import org.apache.avro.io.BinaryDecoder import org.apache.avro.io.DatumReader import org.apache.avro.io

Apache kafka with AVRO, where in the message does the schema id go?

烈酒焚心 提交于 2019-12-11 17:54:59
问题 I do have a number of queries about AVRO schema. I have read that, we need to pass a schema id and the message in the Kafka event.The body of my Kafka event is like - { "componentName": "ABC", //some more fields, "payload": { "name" : "xyz", "age": "23" } } In payload field, we provide the actual data. Here, where will I provide the schema id. I found one answer related to this at [link][1] [1]: https://stackoverflow.com/questions/31204201/apache-kafka-with-avro-and-schema-repo-where-in-the

Avro with MRUnit gives InstantiationException

若如初见. 提交于 2019-12-11 17:26:28
问题 I'm using: hadoop-client 2.2.0 mrunit 1.0.0 avro 1.7.6 avro-mrunit 1.7.6 ... and the entire thing is being built and tested using Maven. I was getting a NullPointerException until I followed the instructions at MRUnit with Avro NullPointerException in Serialization. Now I am getting an InstantiationException: Running mypackage.MyTest log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly.

NoClassDefFoundError, cannot run MapReduceColorCount (Avro 1.7.7)

ぐ巨炮叔叔 提交于 2019-12-11 13:34:07
问题 When trying to run MapReduceColorCount (new MapReduce API) based on webpage http://avro.apache.org/docs/1.7.7/mr.html, I get the following: [cloudera@localhost ~]$ hadoop jar avroColorCount.jar exos.MapReduceColorCount2 inavro01 outavro01 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/mapreduce/AvroKeyInputFormat at exos.MapReduceColorCount2.run(MapReduceColorCount2.java:71) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util

Kafka connector and Schema Registry - Error Retrieving Avro Schema - Subject not found

会有一股神秘感。 提交于 2019-12-11 11:24:18
问题 I have a topic that will eventually have lots of different schemas on it. For now it just has the one. I've created a connect job via REST like this: { "name":"com.mycompany.sinks.GcsSinkConnector-auth2", "config": { "connector.class": "com.mycompany.sinks.GcsSinkConnector", "topics": "auth.events", "flush.size": 3, "my.setting":"bar", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "key.deserializer":"org.apache.kafka.common.serialization.StringDerserializer", "value

Returning a large data structure from Dataflow worker node, getting stuck in serializing graph

杀马特。学长 韩版系。学妹 提交于 2019-12-11 10:14:52
问题 I have large graph ~100k vertices and ~1 million edges being constructed in a DoFn function. When I try to output that graph in DoFn function execution gets stuck at c.output(graph); . public static class Prep extends DoFn<TableRow, TableRows> { @Override public void processElement(ProcessContext c) { //Graph creation logic runs very fast, no problem here LOG.info("Starting Graph Output"); // can see this in logs c.output(graph); //outputs data from DoFn function LOG.info("Ending Graph Output

Why kafka-avro-console-producer doesn't honour the default value for the field?

白昼怎懂夜的黑 提交于 2019-12-11 09:03:27
问题 Although default is defined for a field, kafka-avro-console-producer ignores it completely: $ kafka-avro-console-producer --broker-list localhost:9092 --topic test-avro \ --property schema.registry.url=http://localhost:8081 --property \ value.schema='{"type":"record","name":"myrecord1","fields": \ [{"name":"f1","type":"string"},{"name": "f2", "type": "int", "default": 0}]}' {"f1": "value1"} org.apache.kafka.common.errors.SerializationException: Error deserializing json {"f1": "value1"} to

How updating data in hive transaction tables result in file creation/updation of files in HDFS

时光怂恿深爱的人放手 提交于 2019-12-11 08:53:53
问题 By enabling transactions in Hive, we can update records. Assuming I'm using AVRO format for my hive table. https://hortonworks.com/hadoop-tutorial/using-hive-acid-transactions-insert-update-delete-data/ How does hive takes care of updating an AVRO file and replicating them again on different server ( coz replication factor is 3 ). I could not find a good article which explains this, and the consequence of using ACID in Hive. Since HDFS is recommended for non-updating or append only files, how

Avro serialization with generic type issue

余生颓废 提交于 2019-12-11 07:58:01
问题 I need to write a function in Scala that returns an Array of byte serializated with AvroOutputStream, but in scala i can't get the class of the generic object i'm passing in input. Here is my util class: class AvroUtils { def createByteArray[T](obj: T): Array[Byte] = { val byteArrayStream = new ByteArrayOutputStream() val output = AvroOutputStream.binary[T](byteArrayStream) output.write(obj) output.close() byteArrayStream.toByteArray() } } As you can see if tou test this code is that