What is 4/16 in hashes?

前端未结

关注

 6  663

情深已故 2020-12-10 12:05

if (%hash){
     print \"That was a true value!\\n\";
}
That will be true if (and only if) the hash has at least

6条回答

没有蜡笔的小新 (楼主)

2020-12-10 12:21

This is a slightly modified version of an email I sent to the Perl Beginners mailing list answering this same question.

Saying

my $hash_info = %hash;

Will get you either 0 (if the hash is empty) or the ratio of used to total buckets. This information is almost, but not completely, useless to you. To understand what that means you must first understand how hashing works.

Lets implement a hash using Perl 5. The first thing we need is a hashing function. Hashing functions turn strings into, hopefully, unique numbers. Examples of real strong hashing functions are MD5 or SHA1, but they tend to be too slow for common use, so people tend to use weaker (i.e. ones that produce less unique output) functions for hash tables. Perl 5 uses Bob Jenkins [one-at-a-time] algorithm, which has a nice tradeoff of uniqueness to speed. For our example, I will use a very weak hashing function:

#!/usr/bin/perl

use strict;
use warnings;

sub weak_hash {
       my $key  = shift;
       my $hash = 1;
       #multiply every character in the string's ASCII/Unicode value together
       for my $character (split //, $key) {
               $hash *= ord $character;
       }
       return $hash;
}

for my $string (qw/cat dog hat/) {
       print "$string hashes to ", weak_hash($string), "\n";
}

Because hashing functions tend to give back numbers that are from a range larger than we want, you usually use modulo to reduce the range of numbers it gives back:

#!/usr/bin/perl

use strict;
use warnings;

sub weak_hash {
       my $key  = shift;
       my $hash = 1;
       #multiply every character in the string's ASCII/Unicode value together
       for my $character (split //, $key) {
               $hash *= ord $character;
       }
       return $hash;
}

for my $string (qw/cat dog hat/) {
       # the % operator is constraining the number
       # weak_hash returns to 0 - 10
       print "$string hashes to ", weak_hash($string) % 11, "\n";
}

Now that we have a hashing function, we need somewhere to save the key and value. This is called the hash table. The hash table is often an array whose elements are called buckets (these are the buckets that the ratio is talking about). A bucket will hold all of the key/value pairs that hash to the same number:

#!/usr/bin/perl

use strict;
use warnings;

sub weak_hash {
       my $key  = shift;
       my $hash = 1;
       for my $character (split //, $key) {
               $hash *= ord $character;
       }
       return $hash;
}

sub create {
       my ($size) = @_;

       my @hash_table;

       #set the size of the array
       $#hash_table = $size - 1;

       return \@hash_table;
}


sub store {
       my ($hash_table, $key, $value) = @_;

       #create an index into $hash_table
       #constrain it to the size of the hash_table
       my $hash_table_size = @$hash_table;
       my $index           = weak_hash($key) % $hash_table_size;

       #push the key/value pair onto the bucket at the index
       push @{$hash_table->[$index]}, {
               key   => $key,
               value => $value
       };

       return $value;
}

sub retrieve {
       my ($hash_table, $key) = @_;

       #create an index into $hash_table
       #constrain it to the size of the hash_table
       my $hash_table_size = @$hash_table;
       my $index           = weak_hash($key) % $hash_table_size;

       #get the bucket for this key/value pair
       my $bucket = $hash_table->[$index];

       #find the key/value pair in the bucket
       for my $pair (@$bucket) {
               return $pair->{value} if $pair->{key} eq $key;
       }

       #if key isn't in the bucket:
       return undef;
}

sub list_keys {
       my ($hash_table) = @_;

       my @keys;

       for my $bucket (@$hash_table) {
               for my $pair (@$bucket) {
                       push @keys, $pair->{key};
               }
       }

       return @keys;
}

sub print_hash_table {
       my ($hash_table) = @_;

       for my $i (0 .. $#$hash_table) {
               print "in bucket $i:\n";
               for my $pair (@{$hash_table->[$i]}) {
                       print "$pair->{key} => $pair->{value}\n";
               }
       }
}

my $hash_table = create(3);

my $i = 0;
for my $key (qw/a b c d g j/) {
       store($hash_table, $key, $i++);
}
print_hash_table($hash_table);

print "the a key holds: ", retrieve($hash_table, "a"), "\n";

As we can see from this example, it is possible for one bucket have more key/value pairs than the others. This is a bad situation to be in. It cause the hash to be slow for that bucket. This is one of the uses of the ratio of used to total buckets that hashes return in scalar context. If the hash says that only a few buckets are being used, but they are lots of keys in the hash, then you know you have a problem.

To learn more about hashes, ask questions here about what I have said, or read about them.

0 讨论(0)

查看其它6个回答