determine if the bits are encrypted?

问题

let's assume that I am listening a network , and I acquired some bits, but I want to know if there is a way to determine bits are encrypted ? what method or algorithm exists ? I mean if the bits are meaningless, it means encrypted but is there are more technical approach or algorithm to determine from bits let's say I have 0101010100001011001001100001001, how would you tell that if this is encrypted ?

回答1:

Generally speaking, you can't. Encrypted data is, in almost all cases, indistinguishable from random or heavily compressed data.

In some situations, there may be circumstantial evidence to suggest that the data you're seeing is encrypted. For instance, it may contain headers characteristic of TLS or SSH, or it may be transmitted on a port that is typically used for encrypted data (e.g, 443 for HTTPS). However, this is all a matter of guesswork — if you don't recognize the data, it could be anything.

回答2:

You can't.

Consider this simple example (python):

def xor(s1, s2): return ''.join(chr(ord(a) ^ ord(b)) for a,b in zip(s1, s2))

key = '\x07\x07\x04\x16\x00\x1b\x12N\x17\x1a\x0eHO\x14T\x03\x10\x17R\n\x16V\x04\n\x06\x00\r\x1e'
message = 'this is such a secret message'

ciphertext = xor(message, key)

This is a simple xor-cipher, which can be used e.g. in one-time-pads. Nothing wrong with it obviously, but if you print the ciphertext you get: 'some random output obviously'. The secret message is properly encrypted, but the output is something that does not look encrypted at all.

I've chosen the key by xoring message and the example output ;-) but a key like this is basically random and can be the result of any random number generator.

You can never tell if data is encrypted, encoded, compressed, masked, or whatever just by looking on the bits, because these bits can look like something unencrypted even if they are. This is also the reason why a OTP can't be bruteforced because you can never tell if the bruteforced plaintext is the correct plaintext: http://en.wikipedia.org/wiki/One-time_pad#Attempt_at_cryptanalysis

Try it yourself:

Python 2.7.5 (default, Aug 25 2013, 00:04:04)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def xor(s1, s2): return ''.join(chr(ord(a) ^ ord(b)) for a,b in zip(s1, s2))
...
>>> key = '\x07\x07\x04\x16\x00\x1b\x12N\x17\x1a\x0eHO\x14T\x03\x10\x17R\n\x16V\x04\n\x06\x00\r\x1e'
>>> message = 'this is such a secret message'
>>> ciphertext = xor(message, key)
>>> print ciphertext
some random output obviously

回答3:

It is not possible to prove that data is encrypted or not, but you can analyze the frequency of data values to filter out packets that are likely to be encrypted.

Properly-encrypted data is nearly indistinguishable from random noise. So if you are looking for encrypted data, you should see a fairly even representation of characters. There is a great tool called pcaphistogram.pl for analyzing packets this way. You can get it here: http://www.willhackforsushi.com/code/pcaphistogram.pl.txt

Here is a sample histogram of properly-encrypted data:

Below is a sample histogram of plain-text data. Notice how the values bunch up in the printable region. If you compare this to an ASCII table, you will see lots of lower-case letters(61-7a), a few upper-case(41-5a), spaces (20) and the carriage-return (0a):

Below is a sample histogram for text data that was encrypted with an XOR. The XOR shifted all of the characters, but the basic shape is the same as the unencrypted set.