Is it okay to truncate a SHA256 hash to 128 bits?

故事扮演 提交于 2019-11-29 09:21:23

Yeah that will work. Theoretically it's better to XOR the two halves together but even truncated SHA256 is stronger than MD5. You should still consider the result a 128 bit hash rather than a 256 bit hash though.

My particular recommendation in this particular case is to store and reference using HASH + uniquifier where uniquifier is the count of how many distinct files you've seen with this hash before. This way you don't absolutely fall down flat if somebody tries to store future discovered collision vectors for SHA256.

But is it worth it? If you have a hash for each file, then you essentially have an overhead for each file. Let's say that each file must take up at least 512 bytes (a typical disk sector) and that you're storing these hashes compactly enough so as to not have each hash take up much more than the hash size.

So, even if all your files are 512 bytes, the smallest, you're talking either 16 / 512 = 3.1% or 32 / 512 = 6.3%. In reality, I'd bet your average file size is higher (unless all your files are 1 sector...), so that overhead would be less.

Now, the amount of space you need for hashes scales linearly with the number of files you have. Is that extra space worth that much? Even if you had your mentioned trillion files - that's 1 000 000 000 000 * 16 = ~29 TiB, which is a lot of space, but keep in mind: your data would be 1 000 000 000 000 * 512 = 465 TiB. The numbers are worthless, really, since it's still 3% or 6% overhead. But at this level, where you have a half petabyte of storage, does 15 terabytes matter? At any level, does a 3% savings mean anything? And remember, if they're larger, you save less. (Which, they probably are: good luck getting a 512 byte sector size at that hard disk size.)

So, is this 3% or less disk savings worth the potential risk in security. (Which I'll leave unanswered, as it's waaay not my cup of tea.)

Alternatively, could you, say, group files together in some logical fashion, so that you have less files? (I mean, if you have trillions of 512 byte files, do you really want to hash every byte on disk?)

Yes, that will work.

For the record, there are known in-use collision attacks against MD5, but the SHA-1 attacks are at this point completely theoretical (no SHA-1 collision has ever been found... yet).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!