avro

How to programmatically get schema from confluent schema registry in Python

拥有回忆 提交于 2021-02-19 03:43:06
问题 As of now i am doing something like this reading avsc file to get schema value_schema = avro.load('client.avsc') can i do something to get schema from confluent schema registry using topic-name? i found one way but didn't figure out how to use it. https://github.com/marcosschroh/python-schema-registry-client 回答1: Using confluent-kafka-python from confluent_kafka.avro.cached_schema_registry_client import CachedSchemaRegistryClient sr = CachedSchemaRegistryClient({ 'url': 'http://localhost:8081

Avro Schema for GenericRecord: Be able to leave blank fields

回眸只為那壹抹淺笑 提交于 2021-02-11 17:13:34
问题 I'm using Java to convert JSON to Avro and store these to GCS using Google DataFlow. The Avro schema is created on runtime using SchemaBuilder. One of the fields I define in the schema is an optional LONG field, it is defined like this: SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.record(mainName).fields(); Schema concreteType = SchemaBuilder.nullable().longType(); fields.name("key1").type(concreteType).noDefault(); Now when I create a GenericRecord using the schema above, and

Avro Schema for GenericRecord: Be able to leave blank fields

夙愿已清 提交于 2021-02-11 17:12:09
问题 I'm using Java to convert JSON to Avro and store these to GCS using Google DataFlow. The Avro schema is created on runtime using SchemaBuilder. One of the fields I define in the schema is an optional LONG field, it is defined like this: SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.record(mainName).fields(); Schema concreteType = SchemaBuilder.nullable().longType(); fields.name("key1").type(concreteType).noDefault(); Now when I create a GenericRecord using the schema above, and

Avro Schema for GenericRecord: Be able to leave blank fields

北战南征 提交于 2021-02-11 17:11:47
问题 I'm using Java to convert JSON to Avro and store these to GCS using Google DataFlow. The Avro schema is created on runtime using SchemaBuilder. One of the fields I define in the schema is an optional LONG field, it is defined like this: SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.record(mainName).fields(); Schema concreteType = SchemaBuilder.nullable().longType(); fields.name("key1").type(concreteType).noDefault(); Now when I create a GenericRecord using the schema above, and

Compatibility of Avro contract with enum

假装没事ソ 提交于 2021-02-11 14:51:24
问题 I have an existing avro schema { "name": "myenum", "type": { "type": "enum", "name": "Suit", "symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"] }, "default": null } I want to add null to be the default and updating the contract to the following result in backward compatibility error. what can be done to solve this issue { "name": "myenum", "type": [ null, { "type": "enum", "name": "Suit", "symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"] }], "default": null } 回答1: There is a problem with

Compatibility of Avro contract with enum

牧云@^-^@ 提交于 2021-02-11 14:50:26
问题 I have an existing avro schema { "name": "myenum", "type": { "type": "enum", "name": "Suit", "symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"] }, "default": null } I want to add null to be the default and updating the contract to the following result in backward compatibility error. what can be done to solve this issue { "name": "myenum", "type": [ null, { "type": "enum", "name": "Suit", "symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"] }], "default": null } 回答1: There is a problem with

Spark reading Partitioned avro significantly slower than pointing to exact location

十年热恋 提交于 2021-02-11 13:35:22
问题 I am trying to read partitioned Avro data which is partitioned based on Year, Month and Day and that seems to be significantly slower than pointing it directly to the path. In the Physical plan I can see that the partition filters are getting passed on, so it is not scanning the entire set of directories but still it is significantly slower. E.g. reading the partitioned data like this profitLossPath="abfss://raw@"+datalakename+".dfs.core.windows.net/datawarehouse/CommercialDM.ProfitLoss/"

Spark reading Partitioned avro significantly slower than pointing to exact location

瘦欲@ 提交于 2021-02-11 13:33:04
问题 I am trying to read partitioned Avro data which is partitioned based on Year, Month and Day and that seems to be significantly slower than pointing it directly to the path. In the Physical plan I can see that the partition filters are getting passed on, so it is not scanning the entire set of directories but still it is significantly slower. E.g. reading the partitioned data like this profitLossPath="abfss://raw@"+datalakename+".dfs.core.windows.net/datawarehouse/CommercialDM.ProfitLoss/"

BigQuery Data Transfer Service with BigQuery partitioned table [closed]

怎甘沉沦 提交于 2021-02-08 06:12:56
问题 Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 months ago . Improve this question I have access to a project within BigQuery. I'm looking to create a partitioned table by ingestion time, partitioned by day, then set up a BigQuery Data Transfers process that brings avro files in from multiple directories within a Google Cloud Storage Bucket.

BigQuery Data Transfer Service with BigQuery partitioned table [closed]

谁说胖子不能爱 提交于 2021-02-08 06:11:56
问题 Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 months ago . Improve this question I have access to a project within BigQuery. I'm looking to create a partitioned table by ingestion time, partitioned by day, then set up a BigQuery Data Transfers process that brings avro files in from multiple directories within a Google Cloud Storage Bucket.