问题
Is there a way to identify or inspect an AES encrypted file based on the file content (like the way a ZIP file can be identified by looking for letters "PK" at the beginning of the file)? Is there any magic number associated with AES encrypted files?
We have multiple files in the workflow repository that are either in plain text (could be excel, XML, JSON, text etc.) or AES-256 encrypted and don't have an idea which ones are AES encrypted. I need to write Java code to identify the AES encrypted files and decrypt them automatically. Thanks!
回答1:
In the absence of any standard header, you could look at the byte frequency. AES encrypted data (or indeed anything encrypted with a decent algorithm) will appear to be a random sequence of bytes. This means that the distribution of byte values 0-255 will be approximately flat (i.e. all byte values are equally likely).
However, textual documents will mostly contain printable characters - some much more than others. Spaces, newlines, vowels etc will be disproportionately common.
So, you could build histograms of byte counts for your various files, and look for a simple way to classify them into encrypted or not-encrypted. For example, look at the ratio of the total count of the 5 least common byte values and the total count of the 5 most common byte values. I would expect this ratio to be close to 1.0 for an encrypted file, and quite far from 1.0 for a normal textual document (I'm sure there are much more sophisticated statistical metrics that could be used...).
This might not work so well for extremely short documents, of course.
See also:
- https://www.researchgate.net/post/How_to_detect_if_data_are_encrypted_or_not
回答2:
AES is a block cipher. On its own, it can only transform a 128 bit value into another seemingly random 128 bit value. In order to encrypt more data, a mode of operation and possibly a padding scheme are added. If you want to go further like producing encrypted files, you really need to define a file format, because that's not provided by the previously mentioned mechanisms.
So, if you say you have an AES-encrypted file, it doesn't mean anything aside from your file being encrypted in some way.
The result of modern encryption looks like random noise, so you can compare the hamming weight of an encrypted file to that of a non-compressed structured file. There will likely be differences as DNA mentioned. Compressed files also look like random noise, but they may contain biases which might be significant enough if the file is long enough.
There are some file formats that contain an identifier how the data was encrypted. Most self-made formats don't have anything close to an identifier, because they are written for a specific application and the protocol or file format doesn't change that often. The developer settled for some "cipher suite" and never bothered to make it flexible. If you know the program that the files are produced by, then you can likely find out if they are encrypted. If that program is open source, this is easy. If it is closed source, you can still reverse-engineer it.
来源:https://stackoverflow.com/questions/43333329/identifying-an-aes-encrypted-file