What is the best algorithm to take a long sequence of integers (say 100,000 of them) and return a measurement of how random the sequence is?
The function should retu
As per Knuth, make sure you test the low-order bits for randomness, since many algorithms exhibit terrible randomness in the lowest bits.
As others have pointed out, you can't directly calculate how random a sequence is but there are several statistical tests that you could use to increase your confidence that a sequence is or isn't random.
The DIEHARD suite is the de facto standard for this kind of testing but it neither returns a single value nor is it simple.
ENT - A Pseudorandom Number Sequence Test Program, is a simpler alternative that combines 5 different tests. The website explains how each of these tests works.
If you really need just a single value, you could pick one of the 5 ENT tests and use that. The Chi-Squared test would probably be the best to use, but that might not meet the definition of simple.
Bear in mind that a single test is not as good as running several different tests on the same sequence. Depending on which test you choose, it should be good enough to flag up obviously suspicious sequences as being non-random, but might not fail for sequences that superficially appear random but actually exhibit some pattern.
Although this question is old, it does not seem "solved", so here is my 2 cents, showing that it is still an important problem that can be discussed in simple terms.
Consider password security.
The question was about "long" number sequences, "say 100.000", but does not state what is the criterium for "long". For passwords, 8 characters might be considered long. If those 8 chars were "random", it might be considered a good password, but if it can be easily guessed, a useless password.
Common password rules are to mix upper case, numbers and special characters. But the commonly used "Password1" is still a bad password. (okay, 9-char example, sorry) So how many of the methods of the other answers you apply, you should also check if the password occurs in several dictionaries, including sets of leaked passwords.
But even then, just imagine the rise of a new Hollywood star. This may lead to a new famous name that will be given to newborns, and may become popular as a password, that is not yet in the dictionaries.
If I am correctly informed, it is pretty much impossible to automatically verify that a password selected by a human is random and not derived with an easy to guess algorithm. And also that a good password system should work with computer-generated random passwords.
The conclusion is that there is no method to verify if an 8-char password is random, let alone a good and simple method. And if you cannot verify 8 characters, why would it be easier to verify 100.000 numbers?
The password example is just one example of how important this question of randomness is; think also about encryption. Randomness is the holy grail of security.
It can be done this way:
CAcert Research Lab does a Random Number Generator Analysis.
Their results page evaluates each random sequence using 7 tests (Entropy, Birthday Spacing, Matrix Ranks, 6x8 Matrix Ranks, Minimum Distance, Random Spheres, and the Squeeze). Each test result is then color coded as one of "No Problems", "Potentially deterministic" and "Not Random".
So a function can be written that accepts a random sequence and does the 7 tests. If any of the 7 tests are "Not Random" then the function returns a 0. If all of the 7 tests are "No Problems", then it returns a 1. Otherwise, it can return some number in-between based on how many tests come in as "Potentially Deterministic".
The only thing missing from this solution is the code for the 7 tests.
@JohnFx "... mathematically impossible."
poster states: take a long sequence of integers ...
Thus, just as limits are used in The Calculus, we can take the value as being the value - the study of Chaotics shows us finite limits may 'turn on themselves' producing tensor fields that provide the illusion of absolute(s), and which can be run as long as there is time and energy. Due to the curvature of space-time, there is no perfection - hence the op's "... say 1 if perfectly random." is a misnomer.
{ noted: ample observations on that have been provided - spare me }
According to your position, given two byte[] of a few k, each randomized independently - op could not obtain "a measurement of how random the sequence is" The article at Wiki is informative, and makes definite strides dis-entagling the matter, but
In comparison to classical physics, quantum physics predicts that the properties of a quantum mechanical system depend on the measurement context, i.e. whether or not other system measurements are carried out.
A team of physicists from Innsbruck, Austria, led by Christian Roos and Rainer Blatt, have for the first time proven in a comprehensive experiment that it is not possible to explain quantum phenomena in non-contextual terms.
Source: Science Daily
Let us consider non-random lizard movements. The source of the stimulus that initiates complex movements in the shed tails of leopard geckos, under your original, corrected hyper-thesis, can never be known. We, the experienced computer scientists, suffer the innocent challenge posed by newbies knowing too well that there - in the context of an un-tainted and pristine mind - are them gems and germinators of feed-forward thinking.
If the thought-field of the original lizard produces a tensor-field ( deal with it folks, this is front-line research in sub-linear physics ) then we could have "the best algorithm to take a long sequence" of civilizations spanning from the Toba Event to present through a Chaotic Inversion". Consider the question whether such a thought-field produced by the lizard, taken independently, is a spooky or knowable.
"Direct observation of Hardy's paradox by joint weak measurement with an entangled photon pair," authored by Kazuhiro Yokota, Takashi Yamamoto, Masato Koashi and Nobuyuki Imoto from the Graduate School of Engineering Science at Osaka University and the CREST Photonic Quantum Information Project in Kawaguchi City
Source: Science Daily
( considering the spooky / knowable dichotomy )
I know from my own experiments that direct observation weakens the absoluteness of perceptible tensors, distinguishing between thought and perceptible tensors is impossible using only single focus techniques because the perceptible tensor is not the original thought. A fundamental consequence of quantaeus is that only weak states of perceptible tensors can be reliably distinguished from one another without causing a collapse into a unified perceptible tensor. Try it sometime - work on the mainifestation of some desired eventuality, using pure thought. Because an idea has no time or space, it is therefore in-finite. ( not-finite ) and therefore can attain "perfection" - i.e. absoluteness. Just for a hint, start with the weather as that is the easiest thing to influence ( as least as far as is currently known ) then move as soon as can be done to doing a join from the sleep-state to the waking-state with virtually no interruption of sequential chaining.
There is an almost unavoidable blip there when the body wakes up but it is just like when the doorbell rings, speaking of which brings an interesting area of statistical research to funding availability: How many thoughts can one maintain synchronously? I find that duality is the practical working limit, at triune it either breaks on the next thought or doesn't last very long.
Perhaps the work of Yokota et al could reveal the source of spurious net traffic...maybe it's ghosts.
You could try to zip-compress the sequence. The better you succeed the less random the sequence is.
Thus, heuristic randomness = length of zip-code/length of original sequence