Avro schema for Json array

。_饼干妹妹 提交于 2019-12-20 05:37:28

问题


Suppose I have following json:

[
   {"id":1,"text":"some text","user_id":1},
   {"id":1,"text":"some text","user_id":2},
   ...
]

What would be an appropriate avro schema for this array of objects?


回答1:


[short answer]
The appropriate avro schema for this array of objects would look like:

const type = avro.Type.forSchema({
  type: 'array',
  items: { type: 'record', fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ]
  }
});

[long answer]
We can use Avro to help us build the above schema by given data object.
Let's use npm package "avsc", which is "Pure JavaScript implementation of the Avro specification".
Since Avro can infer a value's schema we can use following trick to get schema by given data (unfortunately it seems can't show nested schemas, but we can ask twice - for top level structure (array) and then for array element):

// don't forget to install avsc
// npm install avsc
//
const avro = require('avsc');

// avro can infer a value's schema
const type = avro.Type.forValue([
   {"id":1,"text":"some text","user_id":1}
]);

const type2 = avro.Type.forValue(
   {"id":1,"text":"some text","user_id":1}
);


console.log(type.getSchema());
console.log(type2.getSchema());

Output:

{ type: 'array',
  items: { type: 'record', fields: [ [Object], [Object], [Object] ] } }
{ type: 'record',
  fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ] }

Now let's compose proper schema and try to use it to serialize object and then de-serialize it back!

const avro = require('avsc');
const type = avro.Type.forSchema({
  type: 'array',
  items: { type: 'record', fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ]
  }
});
const buf = type.toBuffer([
   {"id":1,"text":"some text","user_id":1},
   {"id":1,"text":"some text","user_id":2}]); // Encoded buffer.

const val = type.fromBuffer(buf);
console.log("deserialized object: ", JSON.stringify(val, null, 4));  // pretty print deserialized result

var fs = require('fs');
var full_filename = "/tmp/avro_buf.dat";
fs.writeFile(full_filename, buf, function(err) {
    if(err) {
        return console.log(err);
    }

    console.log("The file was saved to '" + full_filename + "'");
});

Output:

deserialized object:  [
    {
        "id": 1,
        "text": "some text",
        "user_id": 1
    },
    {
        "id": 1,
        "text": "some text",
        "user_id": 2
    }
]
The file was saved to '/tmp/avro_buf.dat'

We can even enjoy the compact binary representation of the above exercise:

hexdump -C /tmp/avro_buf.dat
00000000  04 02 12 73 6f 6d 65 20  74 65 78 74 02 02 12 73  |...some text...s|
00000010  6f 6d 65 20 74 65 78 74  04 00                    |ome text..|
0000001a

Nice, isn't she?-)




回答2:


Concerning your question, correct schema is

{
  "name": "Name",
  "type": "array",
  "namespace": "com.hi.avro.model",
  "items": {
    "name": "NameDetails",
    "type": "record",
    "fields": [
      {
        "name": "id",
        "type": "int"
      },
      {
        "name": "text",
        "type": "string"
      },
      {
        "name": "user_id",
        "type": "int"
      }
    ]
  }
}


来源:https://stackoverflow.com/questions/36040501/avro-schema-for-json-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!