C# performant alternatives to HashSet and Dictionary that do not use GetHashCode

痞子三分冷 提交于 2019-12-10 16:12:50

问题


I'm looking for built-in alternatives of HashSet and Dictionary objects that have better performance than lists but do not use the internal GetHashCode method. I need this because for the class I have written, there is no way of writing a GetHashCode method that fulfills the usual contract with Equals other than

public override int GetHashCode() { return 0; } // or return any other constant value

which would turn HashSet and Dictionary into ordinary lists (performance-wise).

So what I need is a set implementation and a mapping implementation. Any suggestions?

EDIT:

My class is a tolerance-based 3-dimensional vector class:

public class Vector
{
    private static const double TOL = 1E-10;
    private double x, y, z;

    public Vector(double x, double y, double z)
    {
        this.x = x; this.y = y; this.z = z;
    }

    public override bool Equals(object o)
    {
        Vector other = o as Vector;

        if (other == null)
            return false;

        return ((Math.Abs(x - other.x) <= TOL) &&
                (Math.Abs(y - other.y) <= TOL) &&
                (Math.Abs(z - other.z) <= TOL));
    }
}

Note that my Equals method is not transitive. However, in my use case I can make it "locally" transitive because at some point, I will know all vectors that I need to put into my set / mapping key set, and I also know that they will come in clusters. So when I have collected all vectors, I will choose one representative per cluster and replace all original vectors by the representative. Then Equals will be transitive among the elements of my set / mapping key set.

When I have my set or mapping, I will collect vectors from another source (for the sake of this question let's assume I'll ask a user to type in a vector). These can be any possible vector. Those will never be added to the set/mapping, but I will need to know if they are contained in the set / key set of the mapping (regarding tolerance), and I will need to know their value from the mapping.


回答1:


You need a data structure that supports sorting, binary search and fast insertion. Unfortunately there is no such collection in the .NET Framework. The SortedDictionary doesn't supports binary search, while the SortedList suffers from O(n) insertion for unsorted data. So you must search for a third party tool. A good candidate seems to be the TreeDictionary of C5 library. It is a red-black tree implementation that offers the important method RangeFromTo. Here is an incomplete implementation of a Dictionary that has Vectors as keys, backed internally by a C5.TreeDictionary:

public class VectorDictionary<TValue>
{
    private C5.TreeDictionary<double, (Vector, TValue)> _tree =
        new C5.TreeDictionary<double, (Vector, TValue)>();

    public bool TryGetKeyValue(Vector key, out (Vector, TValue) pair)
    {
        double xyz = key.X + key.Y + key.Z;
        // Hoping that not all vectors are crowded in the same diagonal line
        var range = _tree.RangeFromTo(xyz - Vector.TOL * 3, xyz + Vector.TOL * 3);
        var equalPairs = range.Where(e => e.Value.Item1.Equals(key));
        // Selecting a vector from many "equal" vectors is tricky.
        // Some may be more equal than others. :-) Lets return the first for now.
        var selectedPair = equalPairs.FirstOrDefault().Value;
        pair = selectedPair;
        return selectedPair.Item1 != null;
    }

    public Vector GetExisting(Vector key)
    {
        return TryGetKeyValue(key, out var pair) ? pair.Item1 : default;
    }

    public bool Contains(Vector key) => TryGetKeyValue(key, out var _);

    public bool Add(Vector key, TValue value)
    {
        if (Contains(key)) return false;
        _tree.Add(key.X + key.Y + key.Z, (key, value));
        return true;
    }

    public TValue this[Vector key]
    {
        get => TryGetKeyValue(key, out var pair) ? pair.Item2 : default;
        set => _tree.Add(key.X + key.Y + key.Z, (key, value));
    }

    public int Count => _tree.Count;
}

Usage example:

var dictionary = new VectorDictionary<int>();
Console.WriteLine($"Added: {dictionary.Add(new Vector(0.5 * 1E-10, 0, 0), 1)}");
Console.WriteLine($"Added: {dictionary.Add(new Vector(0.6 * 1E-10, 0, 0), 2)}");
Console.WriteLine($"Added: {dictionary.Add(new Vector(1.6 * 1E-10, 0, 0), 3)}");
Console.WriteLine($"dictionary.Count: {dictionary.Count}");
Console.WriteLine($"dictionary.Contains: {dictionary.Contains(new Vector(2.5 * 1E-10, 0, 0))}");
Console.WriteLine($"dictionary.GetValue: {dictionary[new Vector(2.5 * 1E-10, 0, 0)]}");

Output:

Added: True
Added: False
Added: True
dictionary.Count: 2
dictionary.Contains: True
dictionary.GetValue: 3



回答2:


You can get a reasonably good hashcode implementation in your case. Remember that the most important rule for a hash code is the following:

  • Two equal vectors must return the same value

This does not mean that two different vectors cannot return the same value; they obviously have to in some cases, the number of hashes is limited, the number of distinct vectors for all purposes isn't.

Well, with that in mind, simply evaluate your hashcode based upon the vectors coordinates truncated to the tolerance's significant digits minus one. All equal vectors will give you the same hash and a small minority of non equal vectors that differ in the last decimal wont...you can live with that.

UPDATE: Changed rounded to truncated. Rounding is not the right choice.



来源:https://stackoverflow.com/questions/38590854/c-sharp-performant-alternatives-to-hashset-and-dictionary-that-do-not-use-gethas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!