Is it possible to get collisions with base64 Encoding / Decoding

陌路散爱 提交于 2019-12-24 03:38:44

问题


A similar question was asked here: Is base64 encoding always one to one

And apparently the answer (to the similar question) is YES. I already know that, BUT I'd be curious to know the explanation for why these two strings appear to be equivalent after being Base64 decoded:

cwB0AGQAAG==

cwB0AGQAAA==


One more thing... when you select the de-coded string then recode, both re-encode to the same value: cwB0AGQAAA==

What happened?


回答1:


base64 is not one-to-one; there are multiple ways to encode the same bytes. What you're seeing is multiple ways to encode the padding at the end of the string.

base64 encodes bytes (8 bits each) into base 64. A character in base 64 encodes 6 bits, so every three characters can handle 3 bytes. When the length of the input is not a multiple of three, base64 uses = as a padding character. XXX= indicates that only the first two bytes of the group are to be used (where XXX represents three arbitrary base64 characters), while XX== indicates that only the first byte should be used.

The last group in your example is AA==, which encodes a 0 byte. However, the AA part can encode 12 bits, of which the least significant four are ignored on decoding, so you can use any character from A-P and get the same result. When you use the encoder it always picks zeros for those four bits, so you get back AA==.

Padding is actually even more complicated in base64. Technically you can exclude the = characters; the length of the string will indicate their absence (according to Wikipedia, not all decoders support this). Where padding is useful is that it allows base64 strings to be safely concatenated, since every group of four is interpreted the same way. However, this means that padding can also appear in the middle of a string, which means a sequence of bytes can be encoded in all sorts of ways. You can also include whitespace or newlines, which are all ignored.

Despite all of this, base64 is still injective, meaning if x != y, then base64(x) != base64(y); as a result, you cannot get collisions and can always get the original data back. However, base64 is not surjective: there are many ways of encoding the same data.



来源:https://stackoverflow.com/questions/53225750/is-it-possible-to-get-collisions-with-base64-encoding-decoding

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!