avro

Kafka Avro Consumer with Decoder issues

送分小仙女□ 提交于 2019-12-17 16:38:06
问题 When I attempted to run Kafka Consumer with Avro over the data with my respective schema,it returns an error of "AvroRuntimeException: Malformed data. Length is negative: -40" . I see others have had similar issues converting byte array to json, Avro write and read, and Kafka Avro Binary *coder. I have also referenced this Consumer Group Example, which have all been helpful, however no help with this error thus far.. It works up until this part of code (line 73) Decoder decoder =

How to fix Expected start-union. Got VALUE_NUMBER_INT when converting JSON to Avro on the command line?

霸气de小男生 提交于 2019-12-17 09:36:12
问题 I'm trying to validate a JSON file using an Avro schema and write the corresponding Avro file. First, I've defined the following Avro schema named user.avsc : {"namespace": "example.avro", "type": "record", "name": "user", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string", "null"]} ] } Then created a user.json file: {"name": "Alyssa", "favorite_number": 256, "favorite_color": null} And then tried

MapReduce Job to Collect All Unique Fields in HDFS Directory of JSON

帅比萌擦擦* 提交于 2019-12-14 02:38:38
问题 My question is in essence the application of this referenced question: Convert JSON to Parquet I find myself in the rather unique position of having to semi-manually curate an Avro schema for the superset of fields contained in JSON files (composed of arbitrary combinations of known resources)in an HDFS directory. This is part of an ETL pipeline I am trying to develop to convert these files to parquet for much more efficient/easier processing in Spark. I have never written a MapReduce program

Writing to Avro Data file

一世执手 提交于 2019-12-14 02:29:50
问题 The following code simply writes data into avro format and reads and displays the same from the avro file written too. I was just trying out the example in the Hadoop definitive guide book. I was able to execute this first time. Then I got the following error. It did work for the first time. So I am not sure wat mistake i am making. This is the exception: Exception in thread "main" java.io.EOFException: No content to map to Object due to end of input at org.codehaus.jackson.map.ObjectMapper.

Scala pickling: Simple custom pickler for my own class?

荒凉一梦 提交于 2019-12-14 02:18:38
问题 I am trying to pickle some relatively-simple-structured but large-and-slow-to-create classes in a Scala NLP (natural language processing) app of mine. Because there's lots of data, it needs to pickle and esp. unpickle quickly and without bloat. Java serialization evidently sucks in this regard. I know about Kryo but I've never used it. I've also run into Apache Avro, which seems similar although I'm not quite sure why it's not normally mentioned as a suitable solution. Neither is Scala

avro php - reading from buffer

回眸只為那壹抹淺笑 提交于 2019-12-13 19:14:41
问题 I am writing a php script using avro to deserialize data. I receive the data as a buffer of avro binary stream. In the avro php example, I see only an example of reading the data from a file. not a binary buffer. How can I deserialize the data? What I am looking for is a binary decoder for avro 回答1: $binaryBuffer = <get_avro_serialized_record> $writersSchema = '{ "type" : "record", "name" : "Example", "namespace" : "com.example.record", "fields" : [ { "name" : "userId", "type" : "int" .......

JsonMappingException when serializing avro generated object to json

情到浓时终转凉″ 提交于 2019-12-13 13:07:34
问题 I used avro-tools to generate java classes from avsc files, using: java.exe -jar avro-tools-1.7.7.jar compile -string schema myfile.avsc Then I tried to serialize such objects to json by ObjectMapper, but always got a JsonMappingException saying "not an enum" or "not a union". In my test I create the generated object using it's builder or constructor. I got such exceptions for objects of different classes... Sample Code: ObjectMapper serializer = new ObjectMapper(); // com.fasterxml.jackson

json4s serialization of string to avro specific record class

本小妞迷上赌 提交于 2019-12-13 05:12:22
问题 I have json string which I am trying to serialize as an Avro Specific Record(scala case class extends org.apache.avro.specific.SpecificRecordBase ). Json4s throws exception rightfully for malformed json in case of normal scala case class but doesn't throw an exception for the case class which extends specific record . Instead it tries to contain with nulls(not sure if it's json4s doing this or having the specific record). Say I have this json: { "name":"Tom", "id":1, "address":{ "houseNum":

Issue Hive AvroSerDe tblProperties max length

亡梦爱人 提交于 2019-12-13 04:43:47
问题 I try to create a table with AvroSerDe. I have already tried following command to create the table: CREATE EXTERNAL TABLE gaSession ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ('avro.schema.url'='hdfs://<<url>>:<<port>>/<<path>>/<<file>>.avsc'); The creation seems to work, but following table is

Azure, Java: Read and Unzip a file which is saved in Azure Storage (Blobs) and encoded by Avro

你。 提交于 2019-12-13 04:02:25
问题 I have a file in Azure Storage which is zipped and then encoded by Avro as Blob. I read it and decode it as you see in the following code: public static int decodeAvroFile(String avroFile) throws Exception { GenericDatumReader<Object> reader=new GenericDatumReader<Object>(); org.apache.avro.file.FileReader<Object> fileReader= DataFileReader.openReader(new File(avroFile),reader); ByteArrayOutputStream os = new ByteArrayOutputStream(); try { Schema schema=fileReader.getSchema(); DatumWriter