Quickest way to return list of Strings by using wildcard from collection in Java

吃可爱长大的小学妹 提交于 2019-12-25 01:18:20

问题


I have set of 100000 String. And for example I want to get all strings starting with "JO" from that set. What would be the best solution for that?

I was thinking Aho-Corasick but the implementation I have does not support wild cards.


回答1:


If you want all the strings starting with a sequence you can add all the String into a NavigableSet like TreeSet and get the subSet(text, text+'\uFFFF') will give you all the entries starting with text This lookup is O(log n)


If you want all the Strings with end with a sequence, you can do a similar thing, except you have to reverse the String. In this case a TreeMap from reversed String to forward String would be a better structure.

If you want "x*z" you can do a search with the first set and take a union with the values of the Map.

if you want contains "x", you can use a Navigable<String, Set<String>> where the key is each String starting from the first, second, third char etc The value is a Set as you can get duplicates. You can do a search like the starts with structure.




回答2:


Here's a custom matcher class that does the matching without regular expressions (it only uses regex in the constructor, to put it more precisely) and supports wildcard matching:

public class WildCardMatcher {
    private Iterable<String> patternParts;
    private boolean openStart;
    private boolean openEnd;

    public WildCardMatcher(final String pattern) {
        final List<String> tmpList = new ArrayList<String>(
                                     Arrays.asList(pattern.split("\\*")));
        while (tmpList.remove("")) { /* remove empty Strings */ }
        // these last two lines can be made a lot simpler using a Guava Joiner
        if (tmpList.isEmpty())
            throw new IllegalArgumentException("Invalid pattern");
        patternParts = tmpList;
        openStart = pattern.startsWith("*");
        openEnd = pattern.endsWith("*");
    }

    public boolean matches(final String item) {
        int index = -1;
        int nextIndex = -1;
        final Iterator<String> it = patternParts.iterator();
        if (it.hasNext()) {
            String part = it.next();
            index = item.indexOf(part);
            if (index < 0 || (index > 0 && !openStart))
                return false;
            nextIndex = index + part.length();
            while (it.hasNext()) {
                part = it.next();
                index = item.indexOf(part, nextIndex);
                if (index < 0)
                    return false;
                nextIndex = index + part.length();
            }
            if (nextIndex < item.length())
                return openEnd;
        }
        return true;
    }

}

Here's some test code:

public static void main(final String[] args) throws Exception {
    testMatch("foo*bar", "foobar", "foo123bar", "foo*bar", "foobarandsomethingelse");
    testMatch("*.*", "somefile.doc", "somefile", ".doc", "somefile.");
    testMatch("pe*", "peter", "antipeter");
}

private static void testMatch(final String pattern, final String... words) {
    final WildCardMatcher matcher = new WildCardMatcher(pattern);
    for (final String word : words) {
        System.out.println("Pattern " + pattern + " matches word '"
                          + word + "': " + matcher.matches(word));
    }
}

Output:

Pattern foo*bar matches word 'foobar': true
Pattern foo*bar matches word 'foo123bar': true
Pattern foo*bar matches word 'foo*bar': true
Pattern foo*bar matches word 'foobarandsomethingelse': false
Pattern *.* matches word 'somefile.doc': true
Pattern *.* matches word 'somefile': false
Pattern *.* matches word '.doc': true
Pattern *.* matches word 'somefile.': true
Pattern pe* matches word 'peter': true
Pattern pe* matches word 'antipeter': false

While this is far from being production-ready, it should be fast enough and it supports multiple wild cards (including in the first and last place). But of course if your wildcards are only at the end, use Peter's answer (+1).



来源:https://stackoverflow.com/questions/5952553/quickest-way-to-return-list-of-strings-by-using-wildcard-from-collection-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!