问题
I am importing some data from an file (xls, csv, xml) wich will result in a complex in-memory object graph. Now I need to know whether this graph has been modified since it once was exported. What would be a safe way to check this? I suppose I'd export a hashcode with the file? If so would the standard way of generating an object's hashcode suffice? How should I generate the hash? I would prefer to generate the hash on the object graph rather than on the actual stream/file.
回答1:
You can ensure that nobody changes your data by encrypting it or using a hashcode. In case of the text based formats you mentioned you would loose the human-readability, so I think you wolud prefer hashcodes.
If standard hashing methods can be applied heavily depends on what exactly you consider "safe": If you just want to make sure that there was no hardware error when storing/transferring the data or if you want to detect a simple change of someone who did not know what he's doing, that might be fine - if you made sure that you are using a good GetHashCode() function. If you want to protect the data against "attackers" I wouldn't rely on a 32bit "homemade" hash. (Especially if the "attacker" might know the code, e.g. in Open Source projects).
In such cases I would prefer stronger hash functions like MD5 (not very collision safe) or better SHA-2. These work on byte streams you have to hash the data (XML etc.) itself or maybe the .net-serialized data (which makes the hash independent from the data format of your file). .net provides classes for these algorithms, see for example http://msdn.microsoft.com/de-de/library/system.security.cryptography.hmacsha256.aspx
回答2:
The standard solution for your problem isn't hashing the graph. Usually you just keep track of if/when a change occurred.
You could either use an HasChanged
flag, but I don't like that. I usually use a version counter which is incremented on every change. Then when saving to a file I store the current value of the version counter, and to check if something changed I compare the old versioncounter with the current one.
回答3:
I ended up doing the following (wich seems to work pretty well):
- create a custom integer hashcode that include all simple properties of a single object using this algorithm.
- repeat 1. for all complex objects that this object references
- serialize all integer hashcode into one binary stream in a well known order
- create a MD5 checksum of this stream
来源:https://stackoverflow.com/questions/5308057/generating-an-safe-hashcode-for-an-objectgraph