I\'m currently using MD5 hashes but I would like to find something that will create a shorter hash that uses just [a-z][A-Z][0-9]
. It only needs to be around 5-
Is your goal to create a URL shortener or to create a hash function?
If your goal is to create a URL shortener, then you don't need a hash function. In that case, you just want to pre generate a sequence of cryptographically secure random numbers, and then assign each url to be encoded a unique number from the sequence.
You can do this using code like:
using System.Security.Cryptography;
const int numberOfNumbersNeeded = 100;
const int numberOfBytesNeeded = 8;
var randomGen = RandomNumberGenerator.Create();
for (int i = 0; i < numberOfNumbersNeeded; ++i)
{
var bytes = new Byte[numberOfBytesNeeded];
randomGen.GetBytes(bytes);
}
Using the cryptographic number generator will make it very difficult for people to predict the strings you generate, which I assume is important to you.
You can then convert the 8 byte random number into a string using the chars in your alphabet. This is basically a change of base calculation (from base 256 to base 62).
You can decrease the number of characters from the MD5 hash by encoding them as alphanumerics. Each MD5 character is usually represented as hex, so that's 16 possible values. [a-zA-Z0-9] includes 62 possible values, so you could encode each value by taking 4 MD5 values.
EDIT:
here's a function that takes a number ( 4 hex digits long ) and returns [0-9a-zA-Z]. This should give you an idea of how to implement it. Note that there may be some issues with the types; I didn't test this code.
char num2char( unsigned int x ){
if( x < 26 ) return (char)('a' + (int)x);
if( x < 52 ) return (char)('A' + (int)x - 26);
if( x < 62 ) return (char)('0' + (int)x - 52);
if( x == 62 ) return '0';
if( x == 63 ) return '1';
}
I dont think URL shortening services use hashes, I think they just have a running alphanumerical string that is increased with every new URL and stored in a database. If you really need to use a hash function have a look at this link: some hash functions Also, a bit offtopic but depending on what you are working on this might be interesting: Coding Horror article
.NET string object has a GetHashCode() function. It returns an integer. Convert it into a hex and then to an 8 characters long string.
Like so:
string hashCode = String.Format("{0:X}", sourceString.GetHashCode());
More on that: http://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx
UPDATE: Added the remarks from the link above to this answer:
The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode.
If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.
Notes to Callers
The value returned by GetHashCode is platform-dependent. It differs on the 32-bit and 64-bit versions of the .NET Framework.
If you don't care about cryptographic strength, any of the CRC functions will do.
Wikipedia lists a bunch of different hash functions, including length of output. Converting their output to [a-z][A-Z][0-9] is trivial.
You can use CRC32, it is 8 bytes long and similar to MD5. Unique values will be supported by adding timestamp to actual value.
So its will look like http://foo.bar/abcdefg12.