I\'m looking for an explanation of how a hash table works - in plain English for a simpleton like me!
For example, I know it takes the key, calculates the hash (I a
Usage and Lingo:
Real World Example:
Hash & Co., founded in 1803 and lacking any computer technology had a total of 300 filing cabinets to keep the detailed information (the records) for their approximately 30,000 clients. Each file folder were clearly identified with its client number, a unique number from 0 to 29,999.
The filing clerks of that time had to quickly fetch and store client records for the working staff. The staff had decided that it would be more efficient to use a hashing methodology to store and retrieve their records.
To file a client record, filing clerks would use the unique client number written on the folder. Using this client number, they would modulate the hash key by 300 in order to identify the filing cabinet it is contained in. When they opened the filing cabinet they would discover that it contained many folders ordered by client number. After identifying the correct location, they would simply slip it in.
To retrieve a client record, filing clerks would be given a client number on a slip of paper. Using this unique client number (the hash key), they would modulate it by 300 in order to determine which filing cabinet had the clients folder. When they opened the filing cabinet they would discover that it contained many folders ordered by client number. Searching through the records they would quickly find the client folder and retrieve it.
In our real-world example, our buckets are filing cabinets and our records are file folders.
An important thing to remember is that computers (and their algorithms) deal with numbers better than with strings. So accessing a large array using an index is significantly much faster than accessing sequentially.
As Simon has mentioned which I believe to be very important is that the hashing part is to transform a large space (of arbitrary length, usually strings, etc) and mapping it to a small space (of known size, usually numbers) for indexing. This if very important to remember!
So in the example above, the 30,000 possible clients or so are mapped to a smaller space.
The main idea in this is to divide your entire data set into segments as to speed up the actual searching which is usually time consuming. In our example above, each of the 300 filing cabinet would (statistically) contain about 100 records. Searching (regardless the order) through 100 records is much faster than having to deal with 30,000.
You may have noticed that some actually already do this. But instead of devising a hashing methodology to generate a hash key, they will in most cases simply use the first letter of the last name. So if you have 26 filing cabinets each containing a letter from A to Z, you in theory have just segmented your data and enhanced the filing and retrieval process.
Hope this helps,
Jeach!