Google search results: How to find the minimum window that contains all the search keywords?

后端 未结 5 1957
抹茶落季
抹茶落季 2020-12-12 21:10

What is the complexity of the algorithm is that is used to find the smallest snippet that contains all the search key words?

5条回答
  •  执笔经年
    2020-12-12 21:44

    Here's a solution using Java 8.

    static Map.Entry documentSearch(Collection document, Collection query) {
        Queue queue = new ArrayDeque<>(query.size());
        HashSet words = new HashSet<>();
    
        query.stream()
            .forEach(words::add);
    
        AtomicInteger idx = new AtomicInteger();
        IndexPair interval = new IndexPair(0, Integer.MAX_VALUE);
        AtomicInteger size = new AtomicInteger();
        document.stream()
            .map(w -> new KeywordIndexPair(w, idx.getAndIncrement()))
            .filter(pair -> words.contains(pair.word)) // Queue.contains is O(n) so we trade space for efficiency
            .forEach(pair -> {
                // only the first and last elements are useful to the algorithm, so we don't bother removing
                // an element from any other index. note that removing an element using equality
                // from an ArrayDeque is O(n)
                KeywordIndexPair first = queue.peek();
                if (pair.equals(first)) {
                    queue.remove();
                }
                queue.add(pair);
                first = queue.peek();
                int diff = pair.index - first.index;
                if (size.incrementAndGet() == words.size() && diff < interval.interval()) {
                    interval.begin = first.index;
                    interval.end = pair.index;
                    size.set(0);
                }
            });
    
        return new AbstractMap.SimpleImmutableEntry<>(interval.begin, interval.end);
    }
    

    There are 2 static nested classes KeywordIndexPair and IndexPair, the implementation of which should be apparent from the names. Using a smarter programming language that supports tuples those classes wouldn't be necessary.

    Test:

    Document: apple, banana, apple, apple, dog, cat, apple, dog, banana, apple, cat, dog

    Query: banana, cat

    Interval: 8, 10

提交回复
热议问题