How can I determine the statistical randomness of a binary string?
Ergo, how can I code my own test, and return a single value that corresponds to the statistical ra
Some time ago, I developed the a simple heuristic that worked for my purposes.
You simply calculate the "even-ness" of 0s and 1s not only in the string itself, but also on derivatives of the string. For example, the first derivative of 01010101 is 11111111, because every bit changes, and the second derivative is 00000000, because no bit in the first derivative changes. Then you simply have to weigh these "even-nesses" according to your taste.
Here is an example:
#include
#include
float variance(const std::string& x)
{
int zeroes = std::count(x.begin(), x.end(), '0');
float total = x.length();
float deviation = zeroes / total - 0.5f;
return deviation * deviation;
}
void derive(std::string& x)
{
char last = *x.rbegin();
for (std::string::iterator it = x.begin(); it != x.end(); ++it)
{
char current = *it;
*it = '0' + (current != last);
last = current;
}
}
float randomness(std::string x)
{
float sum = variance(x);
float weight = 1.0f;
for (int i = 1; i < 5; ++i)
{
derive(x);
weight *= 2.0f;
sum += variance(x) * weight;
}
return 1.0f / sum;
}
int main()
{
std::cout << randomness("00000000") << std::endl;
std::cout << randomness("01010101") << std::endl;
std::cout << randomness("00000101") << std::endl;
}
Your example inputs yield a "randomness" of 0.129032, 0.133333 and 3.2 respectively.
On a side note, you can get cool fractal graphics by deriving strings ;)
int main()
{
std::string x = "0000000000000001";
for (int i = 0; i < 16; ++i)
{
std::cout << x << std::endl;
derive(x);
}
}
0000000000000001
1000000000000001
0100000000000001
1110000000000001
0001000000000001
1001100000000001
0101010000000001
1111111000000001
0000000100000001
1000000110000001
0100000101000001
1110000111100001
0001000100010001
1001100110011001
0101010101010101
1111111111111111