avro

How to avro binary encode my json string to a byte array?

核能气质少年 提交于 2019-12-04 19:04:10
问题 I have a actual JSON String which I need to avro binary encode to a byte array. After going through the Apache Avro specification, I came up with the below code. I am not sure whether this is the right way to do it or not. Can anyone take a look whether the way I am trying to avro binary encode my JSON String is correct or not?. I am using Apache Avro 1.7.7 version. public class AvroTest { private static final String json = "{" + "\"name\":\"Frank\"," + "\"age\":47" + "}"; private static

Avro: deserialize json - schema with optional fields

删除回忆录丶 提交于 2019-12-04 18:29:26
问题 There are a lot of questions and answers on stackoverflow on the subject, but no one that helps. I have a schema with optional value: { "type" : "record", "name" : "UserSessionEvent", "namespace" : "events", "fields" : [ { "name" : "username", "type" : "string" }, { "name" : "errorData", "type" : [ "null", "string" ], "default" : null }] } And I'm trying deserialize json w/o this field: { "username" : "2271AE67-34DE-4B43-8839-07216C5D10E1", "errorData" : { "string":"070226AC-9B91-47CE-85FE

Converting byte array to Json giving avro Schema as input is giving an error

只愿长相守 提交于 2019-12-04 15:58:19
I have a simple JSON String jsonPayload = "{\"empid\": \"6\",\"empname\": \"Saurabh\",\"address\": \"home\"}"; jsonPayload.getBytes(); I created avro schema {"namespace": "sample.namespace", "type": "record", "name": "Employee", "fields": [ {"name": "empid", "type": "string"}, {"name": "empname", "type": "string"}, {"name": "address", "type": "string"} ] } When I try to compare them I get an error Exception : org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -62 at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336) at org.apache.avro.io.BinaryDecoder

how to read avro files in python 3.5.2

*爱你&永不变心* 提交于 2019-12-04 09:56:16
I am trying to read avro files using python. I installed Apache Avro successfully (I think I did because I am able to "import avro" in the python shell) following the instruction here https://avro.apache.org/docs/1.8.1/gettingstartedpython.html However, when I try to read avro files following the code in the above instruction. I keep receiving errors when importing avro related stuff. >>> import avro.schema Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> import avro.schema File "<frozen importlib._bootstrap>", line 969, in _find_and_load File "<frozen importlib.

Convert JSON to Parquet

戏子无情 提交于 2019-12-04 09:46:28
问题 I have a few TB logs data in JSON format, I want to convert them into Parquet format to gain better performance in analytics stage. I've managed to do this by writing a mapreduce java job which uses parquet-mr and parquet-avro. The only thing I'm not satisfied with is that, my JSON logs doesn't have a fixed schema, I don't know all the fields' names and types. Besides, even I know all the fields' names and types, my schema evolves as time goes on, for example, there will be new fields added

Is there a way to programmatically convert JSON to AVRO Schema?

早过忘川 提交于 2019-12-04 07:14:45
I need to create AVRO file but for that I need 2 things: 1) JSON 2) Avro Schema From these 2 requirements - I have JSON: {"web-app": { "servlet": [ { "servlet-name": "cofaxCDS", "servlet-class": "org.cofax.cds.CDSServlet", "init-param": { "configGlossary:installationAt": "Philadelphia, PA", "configGlossary:adminEmail": "ksm@pobox.com", "configGlossary:poweredBy": "Cofax", "configGlossary:poweredByIcon": "/images/cofax.gif", "configGlossary:staticPath": "/content/static", "templateProcessorClass": "org.cofax.WysiwygTemplate", "templateLoaderClass": "org.cofax.FilesTemplateLoader", "templatePath

Can you append data to an existing Avro data file?

微笑、不失礼 提交于 2019-12-04 06:56:01
It seems like there isn't any way to append data to an existing Avro serialized file. I'd like to have multiple processes writing to a single avro file, but it looks like each time I open it, I start over from scratch. I don't want to read in all the data and then write it back out again. Using the ruby example code I have tried "ab" and "ab+" as various settings, but no joy. file = File.open('data.avr', 'wb') schema = Avro::Schema.parse(SCHEMA) writer = Avro::IO::DatumWriter.new(schema) dw = Avro::DataFile::Writer.new(file, writer, schema) dw << {"username" => "john", "age" => 25, "verified"

Nesting Avro schemas

人走茶凉 提交于 2019-12-04 06:00:09
According to this question on nesting Avro schemas, the right way to nest a record schema is as follows: { "name": "person", "type": "record", "fields": [ {"name": "firstname", "type": "string"}, {"name": "lastname", "type": "string"}, { "name": "address", "type": { "type" : "record", "name" : "AddressUSRecord", "fields" : [ {"name": "streetaddress", "type": "string"}, {"name": "city", "type": "string"} ] }, } ] } I don't like giving the field the name address and having to give a different name ( AddressUSRecord ) to the field's schema. Can I give the field and schema the same name, address ?

多语言跨平台远程过程调用【Avro】

不想你离开。 提交于 2019-12-04 03:44:52
##开始 Avro是Apache的Hadoop家族的项目之一。具有性能高、基本代码少和产出数据量精简等特点。不过这是他们宣传广告,我最近也分别研究了Avro和Protobuf。基本的测试代码,不吐不快。 ##安装 ###Java Avro是应运Hadoop而生的,因此主要也是以Java写就。 Java的安装比较简单,往项目中放入Avro及Avro-rpc的jar包便可。我喜欢使用Maven,因此Maven坐标如下: <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.7.2</version> </dependency> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-ipc</artifactId> <version>1.7.2</version> </dependency> ###Python 熟悉Python模块安装应该很简单。avro的Python模块可以在 [ https://pypi.python.org/pypi Python][ https://pypi.python.org/pypi ] 下载。下载<code>tar.gz</code>或者<code>zip<

Question populating nested records in Avro using a GenericRecord

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-04 03:40:47
Suppose I’ve got the following schema: { "name" : "Profile", "type" : "record", "fields" : [ { "name" : "firstName", "type" : "string" }, { "name" : "address" , "type" : { "type" : "record", "name" : "AddressUSRecord", "fields" : [ { "name" : "address1" , "type" : "string" }, { "name" : "address2" , "type" : "string" }, { "name" : "city" , "type" : "string" }, { "name" : "state" , "type" : "string" }, { "name" : "zip" , "type" : "int" }, { "name" : "zip4", "type": "int" } ] } } ] } I’m using a GenericRecord to represent each Profile that gets created. To add a firstName, it’s easy to do the