Sorting 1 million 8-decimal-digit numbers with 1 MB of RAM

后端 未结 30 2027
栀梦
栀梦 2020-12-22 14:33

I have a computer with 1 MB of RAM and no other local storage. I must use it to accept 1 million 8-digit decimal numbers over a TCP connection, sort them, and then send the

30条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-22 14:51

    To represent the sorted array one can just store the first element and the difference between adjacent elements. In this way we are concerned with encoding 10^6 elements that can sum up to at most 10^8. Let's call this D. To encode the elements of D one can use a Huffman code. The dictionary for the Huffman code can be created on the go and the array updated every time a new item is inserted in the sorted array (insertion sort). Note that when the dictionary changes because of a new item the whole array should be updated to match the new encoding.

    The average number of bits for encoding each element of D is maximized if we have equal number of each unique element. Say elements d1, d2, ..., dN in D each appear F times. In that case (in worst case we have both 0 and 10^8 in input sequence) we have

    sum(1<=i<=N) F. di = 10^8

    where

    sum(1<=i<=N) F = 10^6, or F=10^6/N and the normalized frequency will be p= F/10^=1/N

    The average number of bits will be -log2(1/P) = log2(N). Under these circumstances we should find a case that maximizes N. This happens if we have consecutive numbers for di starting from 0, or, di= i-1, therefore

    10^8=sum(1<=i<=N) F. di = sum(1<=i<=N) (10^6/N) (i-1) = (10^6/N) N (N-1)/2

    i.e.

    N <= 201. And for this case average number of bits is log2(201)=7.6511 which means we will need around 1 byte per input element for saving the sorted array. Note that this doesn't mean D in general cannot have more than 201 elements. It just sows that if elements of D are uniformly distributed, it cannot have more than 201 unique values.

提交回复
热议问题