I\'m trying to create an efficient look-up table in C.
I have an integer as a key and a variable length char*
as the value.
I\'
Declare the value
field as void *value
.
This way you can have any type of data as the value, but the responsibility for allocating and freeing it will be delegated to the client code.
You first have to think of your collision strategy:
We'll pick 1.
Then you have to choose a nicely distributed hash function. For the example, we'll pick
int hash_fun(int key, int try, int max) {
return (key + try) % max;
}
If you need something better, maybe have a look at the middle-squared method.
Then, you'll have to decide, what a hash table is.
struct hash_table {
int max;
int number_of_elements;
struct my_struct **elements;
};
Then, we'll have to define how to insert and to retrieve.
int hash_insert(struct my_struct *data, struct hash_table *hash_table) {
int try, hash;
if(hash_table->number_of_elements >= hash_table->max) {
return 0; // FULL
}
for(try = 0; true; try++) {
hash = hash_fun(data->key, try, hash_table->max);
if(hash_table->elements[hash] == 0) { // empty cell
hash_table->elements[hash] = data;
hash_table->number_of_elements++;
return 1;
}
}
return 0;
}
struct my_struct *hash_retrieve(int key, struct hash_table *hash_table) {
int try, hash;
for(try = 0; true; try++) {
hash = hash_fun(key, try, hash_table->max);
if(hash_table->elements[hash] == 0) {
return 0; // Nothing found
}
if(hash_table->elements[hash]->key == key) {
return hash_table->elements[hash];
}
}
return 0;
}
And least a method to remove:
int hash_delete(int key, struct hash_table *hash_table) {
int try, hash;
for(try = 0; true; try++) {
hash = hash_fun(key, try, hash_table->max);
if(hash_table->elements[hash] == 0) {
return 0; // Nothing found
}
if(hash_table->elements[hash]->key == key) {
hash_table->number_of_elements--;
hash_table->elements[hash] = 0;
return 1; // Success
}
}
return 0;
}
It really depends on the distribution of your key field. For example, if it's a unique value always between 0 and 255 inclusive, just use key % 256
to select the bucket and you have a perfect hash.
If it's equally distributed across all possible int
values, any function which gives you an equally distributed hash value will do (such as the afore-mentioned key % 256
) albeit with multiple values in each bucket.
Without knowing the distribution, it's a little hard to talk about efficient hashes.