avro

Catch error in a for loop python

℡╲_俬逩灬. 提交于 2019-12-02 01:05:09
问题 I have a for loop on an avro data reader object for i in reader: print i then I got a unicode decode error in the for statement so I wanted to ignore that particular record. So I did this try: for i in reader: print i except: pass but it does not continue further. How can I overcome this problem Edit: Error trace added Traceback (most recent call last): File "modify.py", line 22, in <module> for record in reader: File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/datafile.py",

Parquet Data timestamp columns INT96 not yet implemented in Druid Overlord Hadoop task

我的未来我决定 提交于 2019-12-02 00:58:09
Context: I am able to submit a MapReduce job from druid overlord to an EMR. My Data source is in S3 in Parquet format. I have a timestamp column (INT96) in parquet data which is not supported in Avroschema. Error is while parsing the timestamp Issue Stack trace is: Error: java.lang.IllegalArgumentException: INT96 not yet implemented. at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:279) at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:264) at org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert

Using Avrocoder for Custom Types with Generics

拥有回忆 提交于 2019-12-01 21:10:38
问题 I am trying to use AvroCoder to serialise a custom type which is passed around in PCollections in my pipeline. The custom type has a generic field (which currently is a String) When I run the pipeline, I get the AvroTypeException like below probably due to the generic field. Is building and passing the AvroSchema for the object the only way to get around this? Exception in thread "main" org.apache.avro.AvroTypeException: Unknown type: T at org.apache.avro.specific.SpecificData.createSchema

Using Avrocoder for Custom Types with Generics

北慕城南 提交于 2019-12-01 20:41:51
I am trying to use AvroCoder to serialise a custom type which is passed around in PCollections in my pipeline. The custom type has a generic field (which currently is a String) When I run the pipeline, I get the AvroTypeException like below probably due to the generic field. Is building and passing the AvroSchema for the object the only way to get around this? Exception in thread "main" org.apache.avro.AvroTypeException: Unknown type: T at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:255) at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:514) at org

Unable to invoke avro-maven plugin

浪尽此生 提交于 2019-12-01 18:04:44
My question is similar to Unable to compile and create .avro file from .avsc using Maven I have tried all possible things, checked the maven project 100 times, still i am not able to run the avro-maven plugin to generate the code for my avsc file. i have read the following posts and followed the same, but to no success http://grepalex.com/2013/05/24/avro-maven/ https://github.com/phunt/avro-maven-plugin i downloaded the above maven project, and here also the result is same. [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------

Data serialization framework

自闭症网瘾萝莉.ら 提交于 2019-12-01 17:41:51
I'm new to this Apache Avro(serialization framework). I know what serialization is but why there are separate frameworks lik avro, thrift, protocol buffers and Why cant we use java serialization api's instead of these separate frameworks, are there any flaws in java serializatio api's. What is the meaning of below phrase "does not require running a code-generation program when a schema changes" in avro or in any other serializatio framework. Please help me to understand all these!! Why cant we use java serialization api's instead of these separate frameworks, are there any flaws in java

Unable to invoke avro-maven plugin

这一生的挚爱 提交于 2019-12-01 17:18:29
问题 My question is similar to Unable to compile and create .avro file from .avsc using Maven I have tried all possible things, checked the maven project 100 times, still i am not able to run the avro-maven plugin to generate the code for my avsc file. i have read the following posts and followed the same, but to no success http://grepalex.com/2013/05/24/avro-maven/ https://github.com/phunt/avro-maven-plugin i downloaded the above maven project, and here also the result is same. [INFO] Scanning

Avro schema doesn't honor backward compatibilty

。_饼干妹妹 提交于 2019-12-01 15:49:10
I have this avro schema { "namespace": "xx.xxxx.xxxxx.xxxxx", "type": "record", "name": "MyPayLoad", "fields": [ {"name": "filed1", "type": "string"}, {"name": "filed2", "type": "long"}, {"name": "filed3", "type": "boolean"}, { "name" : "metrics", "type": { "type" : "array", "items": { "name": "MyRecord", "type": "record", "fields" : [ {"name": "min", "type": "long"}, {"name": "max", "type": "long"}, {"name": "sum", "type": "long"}, {"name": "count", "type": "long"} ] } } } ] } Here is the code which we use to parse the data public static final MyPayLoad parseBinaryPayload(byte[] payload) {

How to Deserialising Kafka AVRO messages using Apache Beam

我们两清 提交于 2019-12-01 11:43:32
问题 The main goal is the aggregate two Kafka topics, one compacted slow moving data and the other fast moving data which is received every second. I have been able to consume messages in simple scenarios such as a KV (Long,String) using something like: PCollection<KV<Long,String>> input = p.apply(KafkaIO.<Long, String>read() .withKeyDeserializer(LongDeserializer.class) .withValueDeserializer(StringDeserializer.class) PCollection<String> output = input.apply(Values.<String>create()); But this

Data validation in AVRO

僤鯓⒐⒋嵵緔 提交于 2019-12-01 07:33:56
I am new to AVRO and please excuse me if it is a simple question. I have a use case where I am using AVRO schema for record calls. Let's say I have avro schema { "name": "abc", "namepsace": "xyz", "type": "record", "fields": [ {"name": "CustId", "type":"string"}, {"name": "SessionId", "type":"string"}, ] } Now if the input is like { "CustId" : "abc1234" "sessionID" : "000-0000-00000" } I want to use some regex validations for these fields and I want take this input only if it comes in particular format shown as above. Is there any way to specify in avro schema to include regex expression? Any