generating an safe hashcode for an objectgraph

问题

I am importing some data from an file (xls, csv, xml) wich will result in a complex in-memory object graph. Now I need to know whether this graph has been modified since it once was exported. What would be a safe way to check this? I suppose I'd export a hashcode with the file? If so would the standard way of generating an object's hashcode suffice? How should I generate the hash? I would prefer to generate the hash on the object graph rather than on the actual stream/file.

回答1:

You can ensure that nobody changes your data by encrypting it or using a hashcode. In case of the text based formats you mentioned you would loose the human-readability, so I think you wolud prefer hashcodes.

If standard hashing methods can be applied heavily depends on what exactly you consider "safe": If you just want to make sure that there was no hardware error when storing/transferring the data or if you want to detect a simple change of someone who did not know what he's doing, that might be fine - if you made sure that you are using a good GetHashCode() function. If you want to protect the data against "attackers" I wouldn't rely on a 32bit "homemade" hash. (Especially if the "attacker" might know the code, e.g. in Open Source projects).

In such cases I would prefer stronger hash functions like MD5 (not very collision safe) or better SHA-2. These work on byte streams you have to hash the data (XML etc.) itself or maybe the .net-serialized data (which makes the hash independent from the data format of your file). .net provides classes for these algorithms, see for example http://msdn.microsoft.com/de-de/library/system.security.cryptography.hmacsha256.aspx

回答2:

The standard solution for your problem isn't hashing the graph. Usually you just keep track of if/when a change occurred.

You could either use an HasChanged flag, but I don't like that. I usually use a version counter which is incremented on every change. Then when saving to a file I store the current value of the version counter, and to check if something changed I compare the old versioncounter with the current one.

回答3:

I ended up doing the following (wich seems to work pretty well):

create a custom integer hashcode that include all simple properties of a single object using this algorithm.
repeat 1. for all complex objects that this object references
serialize all integer hashcode into one binary stream in a well known order
create a MD5 checksum of this stream

来源：https://stackoverflow.com/questions/5308057/generating-an-safe-hashcode-for-an-objectgraph

标签

.net

serialization

hashcode

crc