What is 4/16 in hashes?

前端 未结 6 657
情深已故
情深已故 2020-12-10 12:05
if (%hash){
     print \"That was a true value!\\n\";
}

That will be true if (and only if) the hash has at least

6条回答
  •  心在旅途
    2020-12-10 12:25

    The fraction is the fill rate of the hash: used buckets vs allocated buckets. Also sometimes called load factor.

    To actually get "4/16" you'll need some tricks. 4 keys will lead to 8 buckets. Thus you need at least 9 keys, and then delete 5.

    $ perl -le'%h=(0..16); print scalar %h; delete $h{$_} for 0..8; print scalar %h'
    9/16
    4/16
    

    Note that your numbers will vary, as the seed is randomized, and you'll cannot predict the exact collisions

    The fill rate is critical hash information when to rehash. Perl 5 rehashes at a fill rate of 100%, see the DO_HSPLIT macro in hv.c. Thus it trades memory for read-only speed. A normal fill rate would be between 80%-95%. You always leave holes to save some collisions. Lower fill rates lead to faster accesses (less collisions), but a higher number of rehashes.

    You don't immediately see the number of collisions with the fraction. You need keys %hash also, to compare to the numerator of the fraction, the used buckets number.

    Thus one part of the collision quality is keys / used buckets:

    my ($used, $max) = split '/',scalar(%hash);
    keys %hash / $used;
    

    But in reality you need to know the sum of the lengths of all linked lists in the buckets. You can access this quality with Hash::Util::bucket_info

    ($keys, $buckets, $used, @length_count)= Hash::Util::bucket_info(\%hash)
    

    While hash access is normally O(1), with long lengths it is only O(n/2), esp. for the overlong buckets. At https://github.com/rurban/perl-hash-stats I provide statistical info of collision qualities for various hash functions for the perl5 core test suite data. I haven't tested tradeoffs for different fill rates yet, as I am rewriting the current hash tables completely.

    Update: For perl5 a better fill rate than 100% would be 90%, as tested recently. But this depends on the used hash function. I used a bad and fast one: FNV1A. With better, slower hash functions you can use higher fill rates. The current default OOAT_HARD is bad AND slow, so should be avoided.

提交回复
热议问题