Make std's data-structure use my existing non-static hash function “hashCode()” by default

问题

I have a moderate-size codebase (>200 .cpp) that use a function hashCode() to return hash number:-

class B01{  //a class
    //..... complex thing ....
    public: size_t hashCode(){ /* hash algorithm #H01 */}  
};
class B02{  //just another unrelated class
    //..... complex thing ....
    public: size_t hashCode(){/* #H02 */}  //This is the same name as above
};

I have used it in various locations, e.g. in my custom data-structure. It works well.

Now, I want to make the hash algorithm recognized by std:: data structure:-

Here is what I should do :- (modified from cppreference, I will call this code #D).

//#D
namespace std {
    template<> struct hash<B01> {
        std::size_t operator()(const B01& b) const {
            /* hash algorithm #H01 */
        }
    };
}

If I insert the block #D (with appropriate implementation) in every class (B01,B02,...), I can call :-

std::unordered_set<B01> b01s;
std::unordered_set<B02> b02s;

without passing the second template argument,
and my hash algorithm (#H01) will be called. (by default)

Question

To make it recognize all of my B01::hashCode, B02::hashCode, ...,
do I have to insert the block #D into all 200+ Bxx.h?

Can I just add a single block #D (in a top header?)
and, from there, re-route std::anyDataStructure to call hashCode() whenever possible?

//pseudo code
namespace std{
    template<> struct hash<X>   {
        std::size_t operator()(const X& x) const { // std::enable_if??
            if(X has hashCode()){    //e.g. T=B01 or B02       
                make this template highest priority   //how?
                return hashCode();
            }else{                   //e.g. T=std::string
                don't match this template;  
            }
        }
    };
}

It sounds like a SFINAE question to me.

Side note: The most similar question in SO didn't ask about how to achieve this.

Edit (Why don't I just refactor it? ; 3 Feb 2017)

I don't know if brute force refactoring is a right path. I guess there might be a better way.
hashCode() is my home. I emotionally attach to it.
I want to keep my code short and clean as possible. std:: blocks are dirty.
It may be just my curiosity. If I stubborn not to refactor my code, how far C++ can go?

回答1:

Solution one

If you can make classes B01, B02, ... class templates with dummy parameters you could simply go along with the specialization of the std::hash for template template that takes dummy template parameter:

#include <iostream>
#include <unordered_set>

struct Dummy {};

template <class = Dummy>
class B01{ 
    public: size_t hashCode() const { return 0; }  
};
template <class = Dummy>
class B02{ 
    public: size_t hashCode() const { return 0; } 
};

namespace std{
    template<template <class> class TT> struct hash<TT<Dummy>>   {
        std::size_t operator()(const TT<Dummy>& x) const { 
            return x.hashCode();
        }
    };
}

int main() {
    std::unordered_set<B01<>> us;
    (void)us;
}

[live demo]

Solution two (contain error/don't use it)

But I believe what you desire looks more like this:

#include <iostream>
#include <unordered_set>

class B01{ 
    public: size_t hashCode() const { return 0; }  
};

class B02{ 
    public: size_t hashCode() const { return 0; } 
};

template <class T, class>
using enable_hash = T;

namespace std{
    template<class T> struct hash<enable_hash<T, decltype(std::declval<T>().hashCode())>>   {
        std::size_t operator()(const T& x) const { 
            return x.hashCode();
        }
    };
}

int main() {
    std::unordered_set<B01> us;
    (void)us;
}

[live demo]

(Inspired by this answer)

However as long this can work on gcc it isn't really allowed by the c++ standard ~~(but I'm also not sure if it is actually literally disallowed...)~~. See this thread in this context.

Edit:

As pointed out by @Barry this gcc behaviour is not mandated by c++ standard and as such there is absolutely no guaranties it will work even in the next gcc version... It can be even perceived as a bug as it allows partial specialization of a template that in fact does not specialize that template.

Solution three (preffered)

Another way could be to specialize std::unordered_set instead of std::hash:

#include <iostream>
#include <type_traits>
#include <unordered_set>

class specializeUnorderedSet { };

class B01: public specializeUnorderedSet { 
    public: size_t hashCode() const { return 0; }  
};

class B02: public specializeUnorderedSet { 
    public: size_t hashCode() const { return 0; } 
};

template <class T>
struct my_hash {
    std::size_t operator()(const T& x) const { 
        return x.hashCode();
    }
};

template <class...>
using voider = void;

template <class T, class = void>
struct hashCodeTrait: std::false_type { };

template <class T>
struct hashCodeTrait<T, voider<decltype(std::declval<T>().hashCode())>>: std::true_type { };

namespace std{

    template <class T>
    struct unordered_set<T, typename std::enable_if<hashCodeTrait<T>::value && std::is_base_of<specializeUnorderedSet, T>::value, std::hash<T>>::type, std::equal_to<T>, std::allocator<T>>:
           unordered_set<T, my_hash<T>, std::equal_to<T>, std::allocator<T>> { };

}

int main() {
    std::unordered_set<B01> us;
    (void)us;
}

According to the discussion presented here it should be perfectly valid. It also work in gcc, clang, icc, VS

To be able to use the code without interfering in the code of classes I believe we can utilize the ADL rules to make sfinae check if given class does not involve std namespace. You can find a background here. Credits also to Cheers and hth. - Alf. The approach could be change as follows:

#include <utility>
#include <unordered_set>
#include <string>
#include <type_traits>
#include <functional>

template< class Type >
void ref( Type&& ) {}

template< class Type >
constexpr auto involve_std()
   -> bool
{
    using std::is_same;
    using std::declval;
    return not is_same< void, decltype( ref( declval<Type &>() ) )>::value;
}

class B01 { 
    public: size_t hashCode() const { return 0; }  
};

class B02 { 
    public: size_t hashCode() const { return 0; } 
};

template <class T>
struct my_hash {
    std::size_t operator()(const T& x) const { 
        return x.hashCode();
    }
};

template <class...>
struct voider {
    using type = void;
};

template <class T, class = void>
struct hashCodeTrait: std::false_type { };

template <class T>
struct hashCodeTrait<T, typename voider<decltype(std::declval<T>().hashCode())>::type>: std::true_type { };

namespace std{

    template <class T>
    struct unordered_set<T, typename std::enable_if<hashCodeTrait<T>::value && !involve_std<T>(), std::hash<T>>::type, std::equal_to<T>, std::allocator<T>>:
           unordered_set<T, my_hash<T>, std::equal_to<T>, std::allocator<T>> { };

}

int main() {
    std::unordered_set<B01> usb01;
    std::unordered_set<std::string> uss;
    static_assert(std::is_base_of<std::unordered_set<B01, my_hash<B01>>, std::unordered_set<B01>>::value, "!");
    static_assert(!std::is_base_of<std::unordered_set<std::string, my_hash<std::string>>, std::unordered_set<std::string>>::value, "!");
    (void)usb01;
    (void)uss;
}

[gcc test], [clang test], [icc test] [gcc 4.9] [VC]

回答2:

It doesn't have to be that way, you can also have a functor:

struct MyHash {
    template <class T>
    auto hashCode(const T & t, int) const -> decltype(t.hashCode()) {
        return t.hashCode();
    }
    template <class T>
    auto hashCode(const T & t, long) const -> decltype(std::hash<T>{}(t)) {
        return std::hash<T>{}(t);
    }

    template <class T>
    auto operator()(const T & t) const -> decltype(hashCode(t,42)) {
        return hashCode(t,42);
    }
};

And have an alias of std::unordered_set with MyHash as hash type:

template <class Key>
using my_unordered_set = std::unordered_set<Key, MyHash>;

or more complete if you also want to be able to provide Equal functor and allocator:

template<
    class Key,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator<Key>
>
using my_unordered_set = std::unordered_set<Key, MyHash, KeyEqual, Allocator>;

Then using it (with any of your Bxx) like you'd use std::unordered_set:

int main() {
    my_unordered_set<B01> b01s;
    my_unordered_set<B02> b02s;

    // or lonely with your type:
    B01 b01{/*...*/};
    std::cout << MyHash{}(b01) << std::endl;

    // or any other:
    std::string str{"Hello World!"};
    std::cout << MyHash{}(str) << std::endl;
}

Concepts

If you can use concepts, they can allow you to specialize std::hash class the way you want:

template <class T>
concept bool HashCodeConcept = requires(T const & t)
{
    {t.hashCode()} -> std::size_t;
};

namespace std {
    template <class T> requires HashCodeConcept <T> 
    struct hash<T> {
        std::size_t operator()(const T& t) const {
            return  t.hashCode();
        }
    };
}

回答3:

While creating conditions to default the hash parameter of std container templates to member methods of groups of classes, one should avoid introducing new issues.

Redundancy
Portability problems
Arcane constructs

The classic object oriented approach may require a patterned edit of the 200+ classes to ensure they provide the basics of std::hash container use. Some options for group transformation are given below to provide the two needed methods.

A public hashCode() is defined in the concrete class where it is unique to that class or by inheritance if it follows a pattern common across classes.
A public operator==() is defined.

The Two Templates

These two templates will remove the redundancy and simplify the declaration as indicated.

template <typename T>
    struct HashStruct {
        std::size_t operator()(const T & t) const {
            return t.hashCode();
        } };
template <class T>
    using SetOfB = std::unordered_set<T, HashStruct<T>>;

Saving Integration Time

An example super-class:

class AbstractB {
    ...
    virtual std::size_t hashCode() const {
        return std::hash<std::string>{}(ms1)
                ^ std::hash<std::string>{}(ms2);
    } }

The following sed expression may save transformation time, assuming the code uses { inline. Similar expressions would work with Boost or using a scripting language like Python.

"s/^([ \t]*class +B[a-zA-Z0-9]+ *)(:?)(.*)$"
        + "/\1 \2 : public AbstractB, \3 [{]/"
        + "; s/ {2,}/ /g"
        + "; s/: ?:/:/g"

An AST based tool would be more reliable. This explains how to use clang capabilities for code transformation. There are new additions such as this Python controller of C++ code transformation.

Discussion

There are several options for where the hash algorithm can reside.

A method of a std container declaration's abstract class
A method of a concrete class (such as #H01 in the example)
A struct template (generally counterproductive and opaque)
The default std::hash

Here's a compilation unit that provides a clean demonstration of the classic of how one might accomplish the desired defaulting and the other three goals listed above while offering flexibility in where the hash algorithm is defined for any given class. Various features could be removed depending on the specific case.

#include <string>
#include <functional>
#include <unordered_set>

template <typename T>
    struct HashStructForPtrs {
        std::size_t operator()(const T tp) const {
            return tp->hashCode(); } };
template <class T>
    using SetOfBPtrs = std::unordered_set<T, HashStructForPtrs<T>>;

template <typename T>
    struct HashStruct {
        std::size_t operator()(const T & t) const {
            return t.hashCode(); } };
template <class T>
    using SetOfB = std::unordered_set<T, HashStruct<T>>;

class AbstractB {
    protected:
        std::string ms;
    public:
        virtual std::size_t hashCode() const {
            return std::hash<std::string>{}(ms); }
        // other option: virtual std::size_t hashCode() const = 0;
        bool operator==(const AbstractB & b) const {
            return ms == b.ms; } };

class B01 : public AbstractB {
    public:
        std::size_t hashCode() const {
            return std::hash<std::string>{}(ms) ^ 1; } };

class B02 : public AbstractB {
    public:
        std::size_t hashCode() const {
            return std::hash<std::string>{}(ms) ^ 2; } };

int main(int iArgs, char * args[]) {

    SetOfBPtrs<AbstractB *> setOfBPointers;
    setOfBPointers.insert(new B01());
    setOfBPointers.insert(new B02());

    SetOfB<B01> setOfB01;
    setOfB01.insert(B01());

    SetOfB<B02> setOfB02;
    setOfB02.insert(B02());

    return 0; };

回答4:

A SFINAE based method of the type you were looking for requires partial specialisation of std::hash. This could be done if your classes Bxx are templates (which is the case if they are derived from a CRTP base). For example (note fleshed out in edit)

#include <type_traits>
#include <unordered_set>
#include <iostream>

template<typename T = void>
struct B {
  B(int i) : x(i) {}
  std::size_t hashCode() const
  {
    std::cout<<"B::hashCode(): return "<<x<<std::endl;
    return x;
  }
  bool operator==(B const&b) const
  { return x==b.x; }
private:
  int x;
};

template<typename T,
         typename = decltype(std::declval<T>().hashCode())> 
using enable_if_has_hashCode = T;

namespace std {
  template<template<typename...> class T, typename... As> 
  struct hash<enable_if_has_hashCode<T<As...>>> 
  {
    std::size_t operator()(const T<As...>& x) const
    { return x.hashCode(); }
  };
  // the following would not work, as its not a partial specialisation
  //    (some compilers allow it, but clang correctly rejects it)
  // tempate<typename T>
  // struct hash<enable_if_hashCode<T>>
  // { /* ... */ }; 
}

int main()
{
  using B00 = B<void>;
  B00 b(42);
  std::unordered_set<B00> set;
  set.insert(b);
}

produces (using clang++ on MacOS)

B::hashvalue(): return 42

see also this related answer to a similar question of mine.

However, concepts are the way of the future to solve problems like this.

回答5:

I have come up with something that appears to partially work. It is a workaround that will allow you to use std::hash on a type that implements hashCode. Take a look:

   //some class that implements hashCode
struct test
{
    std::size_t hashCode() const
    {
        return 0;//insert your has routine
    }
};
//helper class
struct hashable
{
    hashable():value(0){}
    template<typename T>
    hashable(const T& t):value(t.hashCode())
    {}
    template<typename T>
    std::size_t operator()(const T& t) const
    {
        return t.hashCode();
    }

    std::size_t value;
};


//hash specialization of hashable
namespace std {
    template<>
    struct hash<hashable>
    {
        typedef hashable argument_type;
        typedef std::size_t result_type;
        result_type operator()(const argument_type& b) const {
            return b.value;
        }
    };
}
//helper alias so you dont have to specify the hash each time.
template<typename T, typename hash = hashable>
using unordered_set = std::unordered_set<T,hash>;

int main(int argc, char** argv)
{
    unordered_set<test> s;
    test t;
    std::cout<<std::hash<hashable>{}(t)<<std::endl;
}

The code takes advantage of hashable's template constructor and template operator to retrieve the hash from any class that implements hashCode. The std::hash specialization is looking for an instance of hashable but the templated constructor allows an instance to be constructed from a class that has hasCode.

The only gotcha here is that you will have to write unordered_set rather than std::unordered_set to use it and you will have to make sure that std::unordered_set is not brought into scope in any way. So you wont be able to have anything like using namespace std or using std::unordered_set in your source. But besides the few gotchas in the usage this could work for you.

Of course this is just a band-aid on the real issue... which would be not wanting to go through the pain of properly specializing std::hash for each of your types. (I don't blame you)

I would also like to note that with this code substitution is an error... if you would prefer SFINAE it will need modification.

EDIT:

After trying to run:

unordered_set<test> s;
test t;
s.insert(t);

I noticed there were some compiler errors.

I've updated my test class to be equality comparable by adding:

bool operator==(const test& other) const
{
    return hashCode() == other.hashCode();
}

to test which now makes:

//some class that implements hashCode
struct test
{
    std::size_t hashCode() const
    {
        return 0;//insert your has routine
    }
    bool operator==(const test& other) const
    {
        return hashCode() == other.hashCode();
    }
};

来源：https://stackoverflow.com/questions/41867111/make-stds-data-structure-use-my-existing-non-static-hash-function-hashcode

标签

c++

c++11

templates

hash

sfinae