Can we say that a truncated md5
hash is still uniformly distributed?
To avoid misinterpretations: I\'m aware the chance of collisions is much greater th
Yes, not exhibiting any bias is a design requirement for a cryptographic hash. MD5 is broken from a cryptographic point of view however the distribution of the results was never in question.
If you still need to be convinced, it's not a huge undertaking to hash a bunch of files, truncate the output and use ent ( http://www.fourmilab.ch/random/ ) to analyze the result.
I wrote a little php-program to answer this question. It's not very scientific, but it shows the distribution for the first and the last 8 bits of the hashvalues using the natural numbers as hashtext. After about 40.000.000 hashes the difference between the highest and the lowest counts goes down to 1%, so I'd say the distribution is ok. I hope the code is more precise in explaining what was computed :-) Btw, with a similar program I found that the last 8 bits seem to be distributed slightly better than the first.
<?php
// Setup count-array:
for ($y=0; $y<16; $y++) {
for ($x=0; $x<16; $x++) {
$count[dechex($x).dechex($y)] = 0;
}
}
$text = 1; // The text we will hash.
$hashCount = 0;
$steps = 10000;
while (1) {
// Calculate & count a bunch of hashes:
for ($i=0; $i<$steps; $i++) {
$hash = md5($text);
$count[substr($hash, 0, 2)]++;
$count[substr($hash, -2)]++;
$text++;
}
$hashCount += $steps;
// Output result so far:
system("clear");
$min = PHP_INT_MAX; $max = 0;
for ($y=0; $y<16; $y++) {
for ($x=0; $x<16; $x++) {
$n = $count[dechex($x).dechex($y)];
if ($n < $min) $min = $n;
if ($n > $max) $max = $n;
print $n."\t";
}
print "\n";
}
print "Hashes: $hashCount, Min: $min, Max: $max, Delta: ".((($max-$min)*100)/$max)."%\n";
}
?>