avro

Accessing nested fields in AVRO GenericRecord (Java/Scala)

社会主义新天地 提交于 2021-02-07 09:17:58
问题 I have a GenericRecord with nested fields. When I use genericRecord.get(1) it returns an Object that contains the nested AVRO data. I want to be able to access that object like genericRecord.get(1).get(0) , but I can't because AVRO returns an Object. Is there an easy way around this? When I do something like returnedObject.get("item") it says item not a member of returnedObject . 回答1: I figured out one way to do it. Cast the returned Object as a GenericRecord . Example (scala): val data

Confluent Schema Registry : Schema ID deletion

天大地大妈咪最大 提交于 2021-01-29 08:39:12
问题 We are in development and trying to delete the schema for a topic , since the change in incompatible with older schema. We deleted the schema / subject and tried creating the new schema under the same subject name and schema was successfully created. However, when we run the application, it is still pointing to same schema ID. Older schema ID ( for subject 'topic1") : 51 New schema ID ( for subject 'topic1') : 52 Application fails with an error deserializing the message at org.apache.kafka

Apache Avro C Installation

倾然丶 夕夏残阳落幕 提交于 2021-01-29 08:01:43
问题 I am working on a project and I am using Apache Avro. I have downloaded Apache Avro for C and I followed the instructions provided in order to install it on my system (Ubuntu Linux v14.04). After the installation, I have some header files under the /include directory and some libraries under /lib directory. All of those are the ones that were installed from Apache Avro. At this point, I have created my C source files which are as follows: 1) socket_client.h : #include <stdio.h> #include <sys

In schema registry, consumer's schema could differ from the producer's, what actually it means

独自空忆成欢 提交于 2021-01-29 05:40:39
问题 While producing AVRO data to Kafka, Avro serializer writing the same schema ID in the byte array which is used while writing the data. Kafka Consumer fetches the schema from Schema Registry based on schema ID in the byte array received. So same schema ID is used in both i.e. Producer and Consumer and so the schema. But why many article including this one says The consumer's schema could differ from the producer's. Please help me in understanding this. 回答1: Kafka Consumer fetches the schema

Dataflow Python SDK Avro Source/Sync

有些话、适合烂在心里 提交于 2021-01-29 03:00:28
问题 I am looking to ingest and write Avro files in GCS with the Python SDK. Is this currently possible with Avro leveraging the Python SDK? If so how would I do this? I see TODO comments in the source regarding this so I am not too optimistic. 回答1: You are correct: the Python SDK does not yet support this, but it will soon. 回答2: As of version 2.6.0 of the Apache Beam/Dataflow Python SDK, it is indeed possible to read (and write to) avro files in GCS. Even better, the Python SDK for Beam now

Issue in using snappy with avro in python

◇◆丶佛笑我妖孽 提交于 2021-01-29 00:47:02
问题 I am reading the .gz file and converting to AVRO format. When I was using the codec='deflate' . It is working fine. i.e., I was able to convert to avro format. When I use codec='snappy' it is throwing an error stating below: raise DataFileException("Unknown codec: %r" % codec) avro.datafile.DataFileException: Unknown codec: 'snappy' with deflate --> working fine writer = DataFileWriter(open(avro_file, "wb"), DatumWriter(), schema, codec='deflate') with snappy --> throwing an error writer =

Issue in using snappy with avro in python

萝らか妹 提交于 2021-01-29 00:45:30
问题 I am reading the .gz file and converting to AVRO format. When I was using the codec='deflate' . It is working fine. i.e., I was able to convert to avro format. When I use codec='snappy' it is throwing an error stating below: raise DataFileException("Unknown codec: %r" % codec) avro.datafile.DataFileException: Unknown codec: 'snappy' with deflate --> working fine writer = DataFileWriter(open(avro_file, "wb"), DatumWriter(), schema, codec='deflate') with snappy --> throwing an error writer =

Issue in using snappy with avro in python

假如想象 提交于 2021-01-29 00:41:47
问题 I am reading the .gz file and converting to AVRO format. When I was using the codec='deflate' . It is working fine. i.e., I was able to convert to avro format. When I use codec='snappy' it is throwing an error stating below: raise DataFileException("Unknown codec: %r" % codec) avro.datafile.DataFileException: Unknown codec: 'snappy' with deflate --> working fine writer = DataFileWriter(open(avro_file, "wb"), DatumWriter(), schema, codec='deflate') with snappy --> throwing an error writer =

Write nullable item to avro record in Avro C

江枫思渺然 提交于 2021-01-28 22:52:43
问题 Schema: const char schema[] = "{ \"type\":\"record\", \"name\":\"foo\"," "\"fields\": [" "{ \"name\": \"nullableint\", \"type\":[\"int\",\"null\"]}" "]}"; Setting the schema: avro_datum_t foo_record = avro_record(schema); Setting the nullable datum up: avro_datum_t nullableint = avro_int32(1); Set the item: int err = avro_record_set(foo_record,"nullableint",nullableint); Write the item: int err2 = avro_file_writer_append(avro_writer, foo_record); And there is an error. Somehow, I must set the

KafkaAvroDeserializer failing with Kyro Exception

谁说我不能喝 提交于 2021-01-28 11:02:00
问题 I have written a consumer to read Avro's generic record using a schema registry. FlinkKafkaConsumer010 kafkaConsumer010 = new FlinkKafkaConsumer010(KAFKA_TOPICS, new KafkaGenericAvroDeserializationSchema(schemaRegistryUrl), properties); And the Deserialization class looks like this : public class KafkaGenericAvroDeserializationSchema implements KeyedDeserializationSchema<GenericRecord> { private final String registryUrl; private transient KafkaAvroDeserializer inner; public