Will protobuf generate bitwise perfect copy if ran on the same input on different langauges/architectures?

问题

If I use the same .proto file, across several machine (arm, x86, amd64 etc.) with implementations written in different languages (c++, python, java, etc.), will the same message result in the exact same byte sequence when serialized across those different configurations?

I would like to use these bytes for hashing to ensure that the same message, when generated on a different platform, would end up with the exact same hash.

回答1:

"Often, but not quite always"

The reasons you might get variance include:

it is only a "should", not a "must" that fields are written in numerical sequential order - citation, emphasis mine:

when a message is serialized its known fields should be written sequentially by field number

and it is not demanded that fields are thus ordered (it is a "must" that deserializers be able to handle out-of-order fields); this can apply especially when discussing unexpected/extension fields; if two serializations choose different field orders, the bytes will be different
protobuf can be constructed by merging two partial messages, which will by necessity cause out-of-order fields, but when re-serializing an object deserialized from a merged message, it may become normalized (sequential)
the "varint" encoding allows some small subtle ambiguity... the number 1 would usually be encoded as 0x01, but it could also be encoded as 0x8100 or 0x818000 or 0x81808080808000 - the specification doesn't actually demand (AFAIK) that the shortest version be used; I am not aware of any implementation that actually outputs this kind of subnormal form, though :)
some options are designed to be forward- and backward- compatible; in particular, the [packed=true] option on repeated primitive values can be safely toggled at any time, and libraries are expected to cope with it; if you originally serialized it in one way, and now you're serializing it with the other option: the result can be different; a side-effect of this is that a specific library could also simply choose to use the alternate representation, especially if it knows it will be smaller; if two libraries make different decisions here - different bytes

In most common cases, yes: it'll be reliable and repeatable. But this is not an actual guarantee.

The bytes should be compatible, though - it'll still have the same semantics - just not the same bytes. It shouldn't matter what language, framework, library, runtime, or processor you use. If it does: that's a bug.

来源：https://stackoverflow.com/questions/54079241/will-protobuf-generate-bitwise-perfect-copy-if-ran-on-the-same-input-on-differen

标签

cross-platform

protocol-buffers