问题

I was particularly interested over the last few days (more from an algorithmic than mathematical perspective) in investigating the length of a given number's Hailstone sequence (Collatz conjecture). Implementing a recursive algorithm is probably the simplest way to calculate the length, but seemed to me like an unnecessary waste of calculation time. Many sequences overlap; take for example 3's Hailstone sequence:

3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

This has length 7; more specifically, it takes 7 operations to get to 1. If we then take 6:

6 -> 3 -> ...

We notice immediately that we've already calculated this, so we just add on the sequence length of 3 instead of running through all those numbers again, considerably reducing the number of operations required to calculate the sequence length of each number.

I tried to implement this in Java using a HashMap (seemed appropriate given O(1) probabilistic get/put complexity):

import java.util.HashMap;

/* NOTE: cache.put(1,0); is called in main to act as the
 * 'base case' of sorts. 
 */

private static HashMap<Long, Long> cache = new HashMap<>();

/* Returns length of sequence, pulling prerecorded value from
 * from cache whenever possible, and saving unrecorded values
 * to the cache.
 */
static long seqLen(long n) {
    long count = 0, m = n;
    while (true) {
        if (cache.containsKey(n)) {
            count += cache.get(n);
            cache.put(m, count);
            return count;
        }
        else if (n % 2 == 0) {
            n /= 2;
        }
        else {
            n = 3*n + 1;
        }
        count++;
    }
}

What seqLen will essentially do is start at a given number and work through that number's Hailstone sequence until it comes across a number already in the cache, in which case it will add that on to the current value of count, and then log the value and the associated sequence length in the HashMap as a (key,val) pair.

I also had the following fairly standard recursive algorithm for comparison:

static long recSeqLen(long n) {
    if (n == 1) {
        return 0;
    }
    else if (n % 2 == 0) {
        return 1 + recSeqLen(n / 2);
    }
    else return 1 + recSeqLen(3*n + 1);
}

The logging algorithm should, by all accounts, run quite a bit quicker than the naive recursive method. However in most cases, it doesn't run that much faster at all, and for larger inputs, it actually runs slower. Running the following code yields times that vary considerably as the size of n changes:

long n = ... // However many numbers I want to calculate sequence
             // lengths for.

long st = System.nanoTime();
// Iterative logging algorithm
for (long i = 2; i < n; i++) {
    seqLen(i);
}
long et = System.nanoTime();
System.out.printf("HashMap algorithm: %d ms\n", (et - st) / 1000000);

st = System.nanoTime();
// Using recursion without logging values:
for (long i = 2; i < n; i++) {
    recSeqLen(i);
}
et = System.nanoTime();
System.out.printf("Recusive non-logging algorithm: %d ms\n",
                    (et - st) / 1000000);

n = 1,000: ~2ms for both algorithms
n = 100,000: ~65ms for Iterative logging, ~75ms for Recursive non-logging
n = 1,000,000: ~500ms and ~900ms
n = 10,000,000: ~14,000ms and ~10,000ms

At higher values I get memory errors, so I can't check if the pattern continues.

So my question is: why does the logging algorithm suddenly begin to take longer than the naive recursive algorithm for large values of n?

EDIT:

Scrapping HashMaps altogether and opting for a simple array structure (as well as removing part of the overhead of checking whether a value is in the array or not) produces the desired efficiency:

private static final int CACHE_SIZE = 80000000;
private static long[] cache = new long[CACHE_SIZE];

static long seqLen(long n) {
    int count = 0;
    long m = n;

    do {
        if (n % 2 == 0) {
            n /= 2;
        }
        else {
            n = 3*n + 1;
        }
        count++;
    } while (n > m);

    count += cache[(int)n];
    cache[(int)m] = count;
    return count;
}

Iterating over the entire cache size (80 million) now takes a mere 3 seconds, as opposed to 93 seconds using the recursive algorithm. The HashMap algorithm throws memory error, so it can't even be compared, but given it's behaviour at lower values, I have a feeling it wouldn't compare well.

回答1:

Off the cuff, I'd guess it's spending a lot of time reallocating the hash map. Sounds like you're starting it off empty and keep adding stuff to it. That means as it grows in size, it will need to allocate a bigger chunk of memory to store your data, and recompute the hash for all elements, which is O(N). Try pre-allocating the size to what you expect to put in there. See https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html for more discussion.

来源：https://stackoverflow.com/questions/33404821/memoization-efficiency-problems-collatz-hailstone-sequence

标签

java