What are the key differences between Apache Thrift, Google Protocol Buffers, MessagePack, ASN.1 and Apache Avro?

前端 未结 6 1260
自闭症患者
自闭症患者 2020-12-12 09:29

All of these provide binary serialization, RPC frameworks and IDL. I\'m interested in key differences between them and characteristics (performance, ease of use, programming

6条回答
  •  自闭症患者
    2020-12-12 10:00

    We just did an internal study on serializers, here are some results (for my future reference too!)

    Thrift = serialization + RPC stack

    The biggest difference is that Thrift is not just a serialization protocol, it's a full blown RPC stack that's like a modern day SOAP stack. So after the serialization, the objects could (but not mandated) be sent between machines over TCP/IP. In SOAP, you started with a WSDL document that fully describes the available services (remote methods) and the expected arguments/objects. Those objects were sent via XML. In Thrift, the .thrift file fully describes the available methods, expected parameter objects and the objects are serialized via one of the available serializers (with Compact Protocol, an efficient binary protocol, being most popular in production).

    ASN.1 = Grand daddy

    ASN.1 was designed by telecom folks in the 80s and is awkward to use due to limited library support as compared to recent serializers which emerged from CompSci folks. There are two variants, DER (binary) encoding and PEM (ascii) encoding. Both are fast, but DER is faster and more size efficient of the two. In fact ASN.1 DER can easily keep up (and sometimes beat) serializers that were designed 30 years after itself, a testament to it's well engineered design. It's very compact, smaller than Protocol Buffers and Thrift, only beaten by Avro. The issue is having great libraries to support and right now Bouncy Castle seems to be the best one for C#/Java. ASN.1 is king in security and crypto systems and isn't going to go away, so don't be worried about 'future proofing'. Just get a good library...

    MessagePack = middle of the pack

    It's not bad but it's neither the fastest, nor the smallest nor the best supported. No production reason to choose it.

    Common

    Beyond that, they are fairly similar. Most are variants of the basic TLV: Type-Length-Value principle.

    Protocol Buffers (Google originated), Avro (Apache based, used in Hadoop), Thrift (Facebook originated, now Apache project) and ASN.1 (Telecom originated) all involve some level of code generation where you first express your data in a serializer-specific format, then the serializer "compiler" will generate source code for your language via the code-gen phase. Your app source then uses these code-gen classes for IO. Note that certain implementations (eg: Microsoft's Avro library or Marc Gavel's ProtoBuf.NET) let you directly decorate your app level POCO/POJO objects and then the library directly uses those decorated classes instead of any code-gen's classes. We've seen this offer a boost performance since it eliminates a object copy stage (from application level POCO/POJO fields to code-gen fields).

    Some results and a live project to play with

    This project (https://github.com/sidshetye/SerializersCompare) compares important serializers in the C# world. The Java folks already have something similar.

    1000 iterations per serializer, average times listed
    Sorting result by size
    Name                Bytes  Time (ms)
    ------------------------------------
    Avro (cheating)       133     0.0142
    Avro                  133     0.0568
    Avro MSFT             141     0.0051
    Thrift (cheating)     148     0.0069
    Thrift                148     0.1470
    ProtoBuf              155     0.0077
    MessagePack           230     0.0296
    ServiceStackJSV       258     0.0159
    Json.NET BSON         286     0.0381
    ServiceStackJson      290     0.0164
    Json.NET              290     0.0333
    XmlSerializer         571     0.1025
    Binary Formatter      748     0.0344
    
    Options: (T)est, (R)esults, s(O)rt order, (S)erializer output, (D)eserializer output (in JSON form), (E)xit
    
    Serialized via ASN.1 DER encoding to 148 bytes in 0.0674ms (hacked experiment!)
    

提交回复
热议问题