Is it possible to map string to int faster than using hashmap?

前端 未结 7 1277
温柔的废话
温柔的废话 2020-12-08 19:52

I understand that I should not optimize every single spot of my program so please consider this question to be \"academic\"

I have maximum 100 strings and integer n

7条回答
  •  悲哀的现实
    2020-12-08 20:25

    A hashtable[1] is in principle the fastest way.

    You could however compile a Perfect Hash Function given the fact that you know the full domain ahead of time.

    With a perfect hash, there need not be a collision, so you can store the hash table in a linear array!

    With proper tweaking you can then

    • fit all of the hash elements in a limited space, making direct addressing a potential option
    • have a reverse lookup in O(1)

    The 'old-school' tool for generating Perfect Hash functions would be gperf(1). The wikipedia lists more resources on the subject.

    Because of all the debate I ran a demo:

    Downloading NASDAQ ticker symbols and getting 100 random samples from that set, applying gperf as follows:

    gperf -e ' \015' -L C++ -7 -C -E -k '*,1,$' -m 100 selection > perfhash.cpp
    

    Results in a hash-value MAX_HASH_VALUE of 157 and a direct string lookup table of as many items. Here's just the hash function for demonstration purposes:

    inline unsigned int Perfect_Hash::hash (register const char *str, register unsigned int len) {
      static const unsigned char asso_values[] = {
          156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156,  64,  40,   1,  62,   1,
           41,  18,  47,   0,   1,  11,  10,  57,  21,   7,
           14,  13,  24,   3,  33,  89,  11,   0,  19,   5,
           12,   0, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156, 156, 156, 156, 156, 156,
          156, 156, 156, 156, 156, 156, 156, 156, 156
        };
      register int hval = len;
    
      switch (hval) {
          default: hval += asso_values[(unsigned char)str[4]];   /*FALLTHROUGH*/
          case 4:  hval += asso_values[(unsigned char)str[3]];   /*FALLTHROUGH*/
          case 3:  hval += asso_values[(unsigned char)str[2]+1]; /*FALLTHROUGH*/
          case 2:  hval += asso_values[(unsigned char)str[1]];   /*FALLTHROUGH*/
          case 1:  hval += asso_values[(unsigned char)str[0]];   break;
      }
      return hval;
    }
    

    It really doesn't get much more efficient. Do have a look at the full source at github: https://gist.github.com/sehe/5433535

    Mind you, this is a perfect hash, too, so there will be no collisions


    Q. [...] it's obviosly "DELL". Such lookup must be significantly faster than "hashmap lookup".

    A: If you use a simple std::map the net effect is prefix search (because lexicographical string comparison shortcuts on the first character mismatch). The same thing goes for binary search in a sorted container.


    [1] PS. For 100 strings, a sorted array of string with std::search or std::lower_bound would potentially be as fast/faster due to the improved Locality of Reference. Consult your profile results to see whether this applies.

提交回复
热议问题