Object fingerprinting: serialization + untouchable legacy code + Getter-only auto-properties = cornered?

杀马特。学长 韩版系。学妹 提交于 2019-12-04 07:14:32

问题


I have found myself cornered, so here we go.

Context

I need to produce a fingerprint hash code for object diffing. Comparing the hashes of two sets of objects will need to tell me if there are identical objects with the same hash.

The fingerprint hash must be platform-independent. So I went for MD5 hashing.

I am working with a large Object model code base that is out of my control. All types that I will be passed for this fingerprinting can not be modified by me. I cannot add attribute or constructors or modify anything. That does not exclude that the types will change in the future. So any approach must be programmatic -- I cannot just create a Surrogate class to avoid the problem; at least, not manually.

However, performance is not a concern, so reflection has complete green-light.

In addition, I will need to be able to control the exclusion of properties from the hashing. If I exclude a certain property, two object that have all the properties identical to each other except that one will still need to get the same hash.

Issue: serializing to Byte[] with hands tied on the legacy code

MD5 hashing requires the object to be Serialised in Byte[].

The serialisation requires the class to be marked as [Serializable]. Which I cannot add to the legacy code, and naturally it can not be added at runtime either.

So I went for protobuf-net.

Protobuf rightly fails when encountering types that implement an interface with Getter-only auto-properties:

public interface ISomeInterface
{
        double Vpy { get; }
        double Vy { get; }
        double Vpz { get; }
        ...
}

Being this Interface implemented by many types, using Surrogates seems also a no-go (impractical, non maintainable).

I would just need to serialize, not to deserialize, so I don't see why the limitation of protobuf-net in this case. I understand protobuf-net would not be able to round-trip if needed, but I don't need to round-trip!

Question

Am I really cornered? Is there any alternative?

My code

As I said, this works perfectly but only if the objects do not have any property (or nested property) that is a type with a Getter-only auto property.

public static byte[] ToByteArray(this object obj, List<PropertyInfo> exclusionsProps = null)
{
    if (exclusionsProps == null)
        exclusionsProps = new List<PropertyInfo>();

    // Protobuf-net implementation
    ProtoBuf.Meta.RuntimeTypeModel model = ProtoBuf.Meta.TypeModel.Create();

    AddPropsToModel(model, obj.GetType(), exclusionsProps);

    byte[] bytes;
    using (var memoryStream = new MemoryStream())
    {
        model.Serialize(memoryStream, obj);
        bytes = memoryStream.GetBuffer();
    }

    return bytes;
}

public static void AddPropsToModel(ProtoBuf.Meta.RuntimeTypeModel model, Type objType, List<PropertyInfo> exclusionsProps = null)
{
    List<PropertyInfo> props = new List<PropertyInfo>();

    if (exclusionsProps != null)
        props.RemoveAll(pr => exclusionsProps.Exists(t => t.DeclaringType == pr.DeclaringType && t.Name == pr.Name));

    props
        .Where(prop => prop.PropertyType.IsClass || prop.PropertyType.IsInterface).ToList()
        .ForEach(prop =>
        {
            AddPropsToModel(model, prop.PropertyType, exclusionsProps); //recursive call
        }
        );

    var propsNames = props.Select(p => p.Name).OrderBy(name => name).ToList();

    model.Add(objType, true).Add(propsNames.ToArray());
}

Which I will then use as such:

  foreach (var obj in objs)
            {
                byte[] objByte = obj.ToByteArray(exclusionTypes);

                using (MD5 md5Hash = MD5.Create())
                {
                    string hash = GetMd5Hash(md5Hash, objByte);
                    Console.WriteLine(obj.GetType().Name + ": " + hash);
                }
            }

回答1:


The simple solution here is to completely sidestep the root cause of your issue.

When you can't modify the existing classes, but you need some modifications for them, the easiest way to do that is to create a new and improved subclass, where the modifications you require are available.

Considering that the legacy codebase apparently will change outside of your control, the only way to deal with these changes is to generate these types at runtime. Luckily C# allows you to emit intermediate language which can solve exactly this problem.

You'd start with the DefineType method available from the ModuleBuilder class. Specifically you want to use the overload taking a String, TypeAttributes and a Type (representing the class you extend)




回答2:


You pointed out that

If two objects have the same hash, you consider them exact copies of each other

Please realise that a hash has a finite entropy, while the source objects have infinite entropy. Hash collisions are bound to happen. Let's have a look at some examples:

public class Point 
{
    public int X;
    public int Y;
}

public class Coordinate
{
    public int X;
    public int Y;
}

Let's say we calculate the hash as X ^ Y. Instances of both classes could have the same hash, even though they represent different classes. Even when taking just one of these classes, if we take one instance with X = 1, Y = 2 and the other X = 2, Y = 1, they have the same hash. Sure you could optimize the hash algorithm to mitigate the risk on collissions, but you cannot ensure that such collisions can be avoided at all time.

Instead, I would implement a DeepEquals method. This takes more effort (if writing it yourself). But when implemented correctly, it can ensure two objects to be copies.



来源:https://stackoverflow.com/questions/57116127/object-fingerprinting-serialization-untouchable-legacy-code-getter-only-aut

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!