Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
If we consider the additional constraint of trying to give the most efficient answer i.e one that given an input stream, I
, of uniformly distributed integers of length m
from 1-5 outputs a stream O
, of uniformly distributed integers from 1-7 of the longest length relative to m
, say L(m)
.
The simplest way to analyse this is to treat the streams I and O
as 5-ary and 7-ary numbers respectively. This is achieved by the main answer's idea of taking the stream a1, a2, a3,... -> a1+5*a2+5^2*a3+..
and similarly for stream O
.
Then if we take a section of the input stream of length m choose n s.t. 5^m-7^n=c
where c>0
and is as small as possible. Then there is a uniform map from the input stream of length m to integers from 1
to 5^m
and another uniform map from integers from 1 to 7^n
to the output stream of length n where we may have to lose a few cases from the input stream when the mapped integer exceeds 7^n
.
So this gives a value for L(m)
of around m (log5/log7)
which is approximately .82m
.
The difficulty with the above analysis is the equation 5^m-7^n=c
which is not easy to solve exactly and the case where the uniform value from 1
to 5^m
exceeds 7^n
and we lose efficiency.
The question is how close to the best possible value of m (log5/log7) can be attain. For example when this number approaches close to an integer can we find a way to achieve this exact integral number of output values?
If 5^m-7^n=c
then from the input stream we effectively generate a uniform random number from 0
to (5^m)-1
and don't use any values higher than 7^n
. However these values can be rescued and used again. They effectively generate a uniform sequence of numbers from 1 to 5^m-7^n
. So we can then try to use these and convert them into 7-ary numbers so that we can create more output values.
If we let T7(X)
to be the average length of the output sequence of random(1-7)
integers derived from a uniform input of size X
, and assuming that 5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7
.
Then T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0)
since we have a length no sequence with probability 7^n0/5^m with a residual of length 5^m-7^n0
with probability (5^m-7^n0)/5^m)
.
If we just keep substituting we obtain:
T7(5^m) = n0x7^n0/5^m + n1x7^n1/5^m + ... + nrx7^nr/5^m = (n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/5^m
Hence
L(m)=T7(5^m)=(n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/(7^n0+7^n1+7^n2+...+7^nr+s)
Another way of putting this is:
If 5^m has 7-ary representation `a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r
Then L(m) = (a1*7 + 2a2*7^2 + 3a3*7^3+...+rar*7^r)/(a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r)
The best possible case is my original one above where 5^m=7^n+s
, where s<7
.
Then T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1)
as before.
The worst case is when we can only find k and s.t 5^m = kx7+s.
Then T7(5^m) = 1x(k.7)/(k.7+s) = 1+o(1)
Other cases are somewhere inbetween. It would be interesting to see how well we can do for very large m, i.e. how good can we get the error term:
T7(5^m) = m (Log5/Log7)+e(m)
It seems impossible to achieve e(m) = o(1)
in general but hopefully we can prove e(m)=o(m)
.
The whole thing then rests on the distribution of the 7-ary digits of 5^m
for various values of m
.
I'm sure there is a lot of theory out there that covers this I may have a look and report back at some point.