TEqualityComparer may fail for records due to alignment

前端 未结 2 1465
野性不改
野性不改 2020-12-06 18:46

We recently had an issue with a TDictionary instance that failed to lookup items correctly that are were already included in the dictionary. The proble

相关标签:
2条回答
  • 2020-12-06 19:12

    I don't think this is a bug. The behaviour is by design. Without inspection or perhaps some compile time support for understanding these types, it is hard to write a general purpose comparer for arbitrary structured types.

    The default record comparer can only safely be used on types with no padding and containing only certain simple value types that can be compared using naive binary comparison. For instance, floating point types are out because their comparison operators are more complex. Think of NaNs, negative zero, etc.

    I think that the only robust way to deal with this is to write your own equality comparer. Others have suggested default initialising all record instances, but this places a significant burden on consumers of such types, and runs the risk of obscure and hard to track down defects in case some code forgets to default initialise.

    I would use TEqualityComparer<T>.Construct to create suitable equality comparers. This requires the least amount of boilerplate. You supply two anonymous methods: an equals function and a hash function, and Construct returns you a newly minted comparer.

    You can wrap this up in a generic class like so:

    uses
      System.Generics.Defaults,
      System.Generics.Collections;
    
    {$IFOPT Q+}
      {$DEFINE OverflowChecksEnabled}
      {$Q-}
    {$ENDIF}
    function CombinedHash(const Values: array of Integer): Integer;
    var
      Value: Integer;
    begin
      Result := 17;
      for Value in Values do begin
        Result := Result*37 + Value;
      end;
    end;
    {$IFDEF OverflowChecksEnabled}
      {$Q+}
    {$ENDIF}
    
    type
      TPairComparer = class abstract
      public
        class function Construct<TKey, TValue>(
          const EqualityComparison: TEqualityComparison<TPair<TKey, TValue>>;
          const Hasher: THasher<TPair<TKey, TValue>>
        ): IEqualityComparer<TPair<TKey, TValue>>; overload; static;
        class function Construct<TKey, TValue>: IEqualityComparer<TPair<TKey, TValue>>; overload; static;
      end;
    
    
    class function TPairComparer.Construct<TKey, TValue>(
      const EqualityComparison: TEqualityComparison<TPair<TKey, TValue>>;
      const Hasher: THasher<TPair<TKey, TValue>>
    ): IEqualityComparer<TPair<TKey, TValue>>;
    begin
      Result := TEqualityComparer<TPair<TKey, TValue>>.Construct(
        EqualityComparison,
        Hasher
      );
    end;
    
    class function TPairComparer.Construct<TKey, TValue>: IEqualityComparer<TPair<TKey, TValue>>;
    begin
      Result := Construct<TKey, TValue>(
        function(const Left, Right: TPair<TKey, TValue>): Boolean
        begin
          Result :=
            TEqualityComparer<TKey>.Default.Equals(Left.Key, Right.Key) and
            TEqualityComparer<TValue>.Default.Equals(Left.Value, Right.Value);
        end,
        function(const Value: TPair<TKey, TValue>): Integer
        begin
          Result := CombinedHash([
            TEqualityComparer<TKey>.Default.GetHashCode(Value.Key),
            TEqualityComparer<TValue>.Default.GetHashCode(Value.Value)
          ]);
        end
      )
    end;
    

    I've provided two overloads. If the default comparers for your two types are sufficient, then you can use the parameterless overload. Otherwise you can supply two anonymous methods bespoke to the types.

    For your type, you would obtain a comparer like this:

    TPairComparer.Construct<Int64, Integer>
    

    Both of these simple types have default equality comparers that you can use. Hence the parameterless Construct overload can be used.

    0 讨论(0)
  • 2020-12-06 19:12

    The default comparer for records only works for records of pure value types without padding. Relying on it, generally, is not a good idea. For any records which require accurate hashing and equality comparison you really need to write your own comparers.

    As has been noted, initializing all of your records with Default() is also an option, but this approach is both tedious and error prone - it is easy to forget to initialize a record and it is difficult to trace such an omission when it happens. The approach is also only effective at eliminating errors related to padding while a custom comparer can also handle reference types, etc.

    This, for example, demonstrates a working solution to the problem :

    program Project1;
    
    uses
      SysUtils, Windows, StrUtils, Generics.Collections, Generics.Defaults,
      System.Hash;
    
    type
      TPairComparer<TKey, TValue> = class(TEqualityComparer<TPair<TKey, TValue>>)
        public
          function Equals(const Left, Right: TPair<TKey, TValue>): Boolean; override;
          function GetHashCode(const Value: TPair<TKey, TValue>): Integer;  override;
      end;
      TInt64IntDict<TValue> = class(TDictionary<TPair<Int64, Integer>, TValue>)
        public constructor Create;
      end;
    
    function TPairComparer<TKey, TValue>.Equals(const Left: TPair<TKey, TValue>;
                                                const Right: TPair<TKey, TValue>) : boolean;
    begin
      result := TEqualityComparer<TKey>.Default.Equals(Left.Key, Right.Key) and
                TEqualityComparer<TValue>.Default.Equals(Left.Value, Right.Value);
    end;
    
    {$IFOPT Q+}
      {$DEFINE OVERFLOW_ON}
      {$Q-}
    {$ELSE}
      {$UNDEF OVERFLOW_ON}
    {$ENDIF}
    function TPairComparer<TKey, TValue>.GetHashCode(const Value: TPair<TKey, TValue>) : integer;
    begin
      result := THashBobJenkins.GetHashValue(Value.Key, SizeOf(Value.Key), 23 * 31);
      result := THashBobJenkins.GetHashValue(Value.Value, SizeOf(Value.Value), result * 31);
    end;
    {$IFDEF OVERFLOW_ON}
      {$Q+}
      {$UNDEF OVERFLOW_ON}
    {$ENDIF}
    
    constructor TInt64IntDict<TValue>.Create;
    begin
      inherited Create(0, TPairComparer<Int64, Integer>.Create);
    end;
    
    
    
    var
      i1, i2: TPair<Int64, Integer>;
      LI64c : TPairComparer<Int64, Integer>;
      LDict : TInt64IntDict<double>;
    begin
      FillMemory(@i1, SizeOf(i1), $00);
      FillMemory(@i2, SizeOf(i1), $01);
      i1.Key := 2;
      i1.Value := -1;
      i2.Key := i1.Key;
      i2.Value := i1.Value;
      WriteLn(Format('i1 key = %d, i1 value = %d', [i1.Key, i1.Value]));
      WriteLn(Format('i2 key = %d, i2 value = %d', [i2.Key, i2.Value]));
    
      WriteLn; WriteLn('Using Default comparer');
      if TEqualityComparer<TPair<Int64, Integer>>.Default.Equals(i1, i2) then
        WriteLn('i1 equals i2') else WriteLn('i1 not equals i2');
      if TEqualityComparer<TPair<Int64, Integer>>.Default.GetHashCode(i1) =
         TEqualityComparer<TPair<Int64, Integer>>.Default.GetHashCode(i2) then
           WriteLn('i1, i2 - hashes match') else WriteLn('i1, i2 - hashes do not match');
    
      WriteLn;  WriteLn('Using custom comparer');
      LI64c := TPairComparer<Int64, Integer>.Create;
      if LI64c.Equals(i1, i2) then
        WriteLn('i1 equals i2') else WriteLn('i1 not equals i2');
      if LI64c.GetHashCode(i1) = LI64c.GetHashCode(i2) then
           WriteLn('i1, i2 - hashes match') else WriteLn('i1, i2 - hashes do not match');
      WriteLn;
      LDict := TInt64IntDict<double>.Create;
      LDict.Add(i1, 1.23);
      if LDict.ContainsKey(i2) then
        WriteLn('Dictionary already contains key') else
          WriteLn('Dictionary does not contain key');
      ReadLn;
    end.
    

    This produces output

    i1 key = 2, i1 value = -1
    i2 key = 2, i2 value = -1

    Using Default comparer
    i1 not equals i2
    i1, i2 - hashes do not match

    Using custom comparer
    i1 equals i2
    i1, i2 - hashes match

    Dictionary already contains key

    That said, as David's answer demonstrates, using a delegated comparer will produce less overhead and should be favoured in practice.

    0 讨论(0)
提交回复
热议问题