Google search results: How to find the minimum window that contains all the search keywords?

后端未结

关注

 5  1957

抹茶落季 2020-12-12 21:10

What is the complexity of the algorithm is that is used to find the smallest snippet that contains all the search key words?

5条回答

执笔经年 (楼主)

2020-12-12 21:44

Here's a solution using Java 8.

static Map.Entry documentSearch(Collection document, Collection query) {
    Queue queue = new ArrayDeque<>(query.size());
    HashSet words = new HashSet<>();

    query.stream()
        .forEach(words::add);

    AtomicInteger idx = new AtomicInteger();
    IndexPair interval = new IndexPair(0, Integer.MAX_VALUE);
    AtomicInteger size = new AtomicInteger();
    document.stream()
        .map(w -> new KeywordIndexPair(w, idx.getAndIncrement()))
        .filter(pair -> words.contains(pair.word)) // Queue.contains is O(n) so we trade space for efficiency
        .forEach(pair -> {
            // only the first and last elements are useful to the algorithm, so we don't bother removing
            // an element from any other index. note that removing an element using equality
            // from an ArrayDeque is O(n)
            KeywordIndexPair first = queue.peek();
            if (pair.equals(first)) {
                queue.remove();
            }
            queue.add(pair);
            first = queue.peek();
            int diff = pair.index - first.index;
            if (size.incrementAndGet() == words.size() && diff < interval.interval()) {
                interval.begin = first.index;
                interval.end = pair.index;
                size.set(0);
            }
        });

    return new AbstractMap.SimpleImmutableEntry<>(interval.begin, interval.end);
}

There are 2 static nested classes KeywordIndexPair and IndexPair, the implementation of which should be apparent from the names. Using a smarter programming language that supports tuples those classes wouldn't be necessary.

Test:

Document: apple, banana, apple, apple, dog, cat, apple, dog, banana, apple, cat, dog

Query: banana, cat

Interval: 8, 10

0 讨论(0)

查看其它5个回答