问题
If I use the same .proto
file, across several machine (arm, x86, amd64 etc.) with implementations written in different languages (c++, python, java, etc.), will the same message result in the exact same byte sequence when serialized across those different configurations?
I would like to use these bytes for hashing to ensure that the same message, when generated on a different platform, would end up with the exact same hash.
回答1:
"Often, but not quite always"
The reasons you might get variance include:
it is only a "should", not a "must" that fields are written in numerical sequential order - citation, emphasis mine:
when a message is serialized its known fields should be written sequentially by field number
and it is not demanded that fields are thus ordered (it is a "must" that deserializers be able to handle out-of-order fields); this can apply especially when discussing unexpected/extension fields; if two serializations choose different field orders, the bytes will be different
- protobuf can be constructed by merging two partial messages, which will by necessity cause out-of-order fields, but when re-serializing an object deserialized from a merged message, it may become normalized (sequential)
- the "varint" encoding allows some small subtle ambiguity... the number
1
would usually be encoded as 0x01, but it could also be encoded as 0x8100 or 0x818000 or 0x81808080808000 - the specification doesn't actually demand (AFAIK) that the shortest version be used; I am not aware of any implementation that actually outputs this kind of subnormal form, though :) - some options are designed to be forward- and backward- compatible; in particular, the
[packed=true]
option onrepeated
primitive values can be safely toggled at any time, and libraries are expected to cope with it; if you originally serialized it in one way, and now you're serializing it with the other option: the result can be different; a side-effect of this is that a specific library could also simply choose to use the alternate representation, especially if it knows it will be smaller; if two libraries make different decisions here - different bytes
In most common cases, yes: it'll be reliable and repeatable. But this is not an actual guarantee.
The bytes should be compatible, though - it'll still have the same semantics - just not the same bytes. It shouldn't matter what language, framework, library, runtime, or processor you use. If it does: that's a bug.
来源:https://stackoverflow.com/questions/54079241/will-protobuf-generate-bitwise-perfect-copy-if-ran-on-the-same-input-on-differen