This question has to do with PHP\'s implementation of crypt(). For this question, the first 7 characters of the salt are not counted, so a salt \'$2a$07$a
\' wo
Great answer, and clear explanation. But it seems to me there is either a bug in the implementation or some further explanation of the intent is needed {the comments to the post explain why there is not a bug}. The current php documentation states:
CRYPT_BLOWFISH - Blowfish hashing with a salt as follows: "$2a$", a two digit cost parameter, "$", and 22 base 64 digits from the alphabet "./0-9A-Za-z". Using characters outside of this range in the salt will cause crypt() to return a zero-length string. The two digit cost parameter is the base-2 logarithm of the iteration count for the underlying Blowfish-based hashing algorithmeter and must be in range 04-31, values outside this range will cause crypt() to fail.
This is consistent with what's been stated and demonstrated here. Unfortunately the documentation doesn't describe the return value very usefully:
Returns the hashed string or a string that is shorter than 13 characters and is guaranteed to differ from the salt on failure.
But as shown in the reply by Dereleased, if the input salt string is valid, the output consists of the input salt padded out to a fixed length with '$' characters, with the 32-character computed hash value appended to it. Unfortunately, the salt in the result is padded out to only 21 base64 digits, not 22! This is shown by the last three lines in that reply, where we see one '$' for 20 digits, no '$' for 21, and when there are 22 base64 digits in the salt, the first character of the hash result replaces the 22nd digit of the input salt. The function is still usable, because the complete value it computes is available to the caller as substr(crypt($pw,$salt), 28, 32)
, and the caller already knows the complete salt value because it passed that string as an argument. But it's very difficult to understand why the return value is designed so that it can only give you 126 bits of the 128-bit salt value. In fact, it's hard to understand why it includes the input salt at all; but omitting 2 bits of it is really unfathomable.
Here's a little snippet showing that the 22nd base64 digit contributes just two more bits to the salt actually used in the computation (there are only 4 distinct hashes produced):
$alphabet = './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
$lim = strlen($alphabet);
$saltprefix = '$2a$04$123456789012345678901'; // 21 base64 digits
for ($i = 0; $i < $lim; ++$i ) {
if ($i = 16 || $i == 32 || $i == 48) echo "\n";
$salt = $saltprefix . substr($alphabet, $i, 1);
$crypt = crypt($password, $salt);
echo "salt ='$salt'\ncrypt='$crypt'\n";
}
salt ='$2a$04$123456789012345678901.'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901/'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901A'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901B'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901C'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901D'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901E'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901F'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901G'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901H'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901I'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901J'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901K'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901L'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901M'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901N'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901O'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
salty='$2a$04$123456789012345678901P'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
salty='$2a$04$123456789012345678901Q'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
... 13 more pairs of output lines with same hash
salt ='$2a$04$123456789012345678901e'
crypt='$2a$04$123456789012345678901e.1cixwQ2qnBqwFeEcMfNfXApRK0ktqm'
... 15 more pairs of output lines with same hash
salt ='$2a$04$123456789012345678901u'
crypt='$2a$04$123456789012345678901u5yLyHIE2JetWU67zG7qvtusQ2KIZhAa'
... 15 more pairs of output lines with same hash
The grouping of the identical hash values also shows that the mapping of the alphabet actually used is most likely as written here, rather then in the order shown in the other reply.
Perhaps the interface was designed this way for some kind of compatibility, and perhaps because it has already shipped this way it can't be changed. {the first comment to the post explains why the interface is this way}. But certainly the documentation ought to explain what's going on. Just in case the bug might get fixed some day, perhaps it would be safest to obtain the hash value with:
substr(crypt($pw,$salt), -32)
As a final note, while the explanation of why the hash value repeats when the number of base64 digits specified mod 4 == 1
makes sense in terms of why code might behave that way, it doesn't explain why writing the code that way was a good idea. The code could and arguably should include the bits from a base64 digit that makes up a partial byte when computing the hash, instead of just discarding them. If the code had been written that way, then it seems likely the problem with losing the 22nd digit of the salt in the output would not have appeared, either. {As the comments to the post explain, even though the 22nd digit is overwritten, the digit of the hash that overwrites it will be only one of the four possible values [.Oeu]
, and these are the only significant values for the 22nd digit. If the 22nd digit is not one of those four values, it will be replaced by the one of those four that produces the same hash.}
In light of the comments, it seems clear there is no bug, just incredibly taciturn documentation :-) Since I'm not a cryptographer, I can't say this with any authority, but it seems to me that it's a weakness of the algorithm that a 21-digit salt apparently can produce all possible hash values, while a 22-digit salt limits the first digit of the hash to only one of four values.