Partially match strings in case of List.contains(String)

后端 未结 9 862
夕颜
夕颜 2020-12-10 11:23

I have a List

List list = new ArrayList();
list.add(\"ABCD\");
list.add(\"EFGH\");
list.add(\"IJ KL\")         


        
相关标签:
9条回答
  • 2020-12-10 11:58

    You could use IterableUtils from Apache Commons Collections.

    List<String> list = new ArrayList<String>();
    list.add("ABCD");
    list.add("EFGH");
    list.add("IJ KL");
    list.add("M NOP");
    list.add("UVW X");
    
    boolean hasString = IterableUtils.contains(list, "IJ", new Equator<String>() {
        @Override
        public boolean equate(String o1, String o2) {
            return o2.contains(o1);
        }
    
        @Override
        public int hash(String o) {
            return o.hashCode();
        }
    });
    
    System.out.println(hasString); // true
    
    0 讨论(0)
  • 2020-12-10 11:58

    Here's some code that uses a regex to shortcut the inner loop if none of the test Strings are found in the target String.

    public static void main(String[] args) throws Exception {
        List<String> haystack = Arrays.asList(new String[] { "ABCD", "EFGH", "IJ KL", "M NOP", "UVW X" });
        List<String> needles = Arrays.asList(new String[] { "IJ", "NOP" });
    
        // To cut down on iterations, create one big regex to check the whole haystack
        StringBuilder sb = new StringBuilder();
        sb.append(".*(");
        for (String needle : needles) {
            sb.append(needle).append('|');
        }
        sb.replace(sb.length() - 1, sb.length(), ").*");
        String regex = sb.toString();
    
        for (String target : haystack) {
            if (!target.matches(regex)) {
                System.out.println("Skipping " + target);
                continue;
            }
    
            for (String needle : needles) {
                if (target.contains(needle)) {
                    System.out.println(target + " contains " + needle);
                }
            }
        }
    }
    

    Output:

    Skipping ABCD
    Skipping EFGH
    IJ KL contains IJ
    M NOP contains NOP
    Skipping UVW X
    

    If you really want to get cute, you could bisect use a binary search to identify which segments of the target list matches, but it mightn't be worth it.

    It depends which is how likely it is that yo'll find a hit. Low hit rates will give a good result. High hit rates will perform not much better than the simple nested loop version. consider inverting the loops if some needles hit many targets, and other hit none.

    It's all about aborting a search path ASAP.

    0 讨论(0)
  • 2020-12-10 11:59

    As a second answer, upon rereading your question, you could also inherit from the interface List, specialize it for Strings only, and override the contains() method.

    public class PartialStringList extends ArrayList<String>
    {
        public boolean contains(Object o)
        {
            if(!(o instanceof String))
            {
                return false;
            }
            String s = (String)o;
            Iterator<String> iter = iterator();
            while(iter.hasNext())
            {
                String iStr = iter.next();
                if (iStr.contain(s))
                {
                    return true;
                }
            }
            return false;
        }
    }
    

    Judging by your earlier comments, this is maybe not the speed you're looking for, but is this more similar to what you were asking for?

    0 讨论(0)
  • 2020-12-10 11:59

    This is not a direct answer to the given problem. But I guess this answer will help someone to compare partially both given and the elements in a list using Apache Commons Collections.

    final Equator equator = new Equator<String>() {
            @Override
            public boolean equate(String o1, String o2) {
                final int i1 = o1.lastIndexOf(":");
                final int i2 = o2.lastIndexOf(":");
                return o1.substring(0, i1).equals(o2.substring(0, i2));
            }
    
            @Override
            public int hash(String o) {
                final int i1 = o.lastIndexOf(":");
                return o.substring(0, i1).hashCode();
            }
        };
        final List<String> list = Lists.newArrayList("a1:v1", "a2:v2");
        System.out.println(IteratorUtils.matchesAny(list.iterator(), new EqualPredicate("a2:v1", equator)));
    
    0 讨论(0)
  • 2020-12-10 12:04

    How about:

    java.util.List<String> list = new java.util.ArrayList<String>();
    list.add("ABCD");
    list.add("EFGH");
    list.add("IJ KL");
    list.add("M NOP");
    list.add("UVW X");
    java.util.regex.Pattern p = java.util.regex.Pattern.compile("IJ");
    java.util.regex.Matcher m = p.matcher("");
    for(String s : list)
    {
        m.reset(s);
        if(m.find()) System.out.println("Partially Matched");
    }
    
    0 讨论(0)
  • 2020-12-10 12:06

    Yes, you can! Sort of.

    What you are looking for, is often called fuzzy searching or approximate string matching and there are several solutions to this problem.

    With the FuzzyWuzzy lib, for example, you can have all your strings assigned a score based on how similar they are to a particular search term. The actual values seem to be integer percentages of the number of characters matching with regards to the search string length.

    After invoking FuzzySearch.extractAll, it is up to you to decide what the minimum score would be for a string to be considered a match.

    There are also other, similar libraries worth checking out, like google-diff-match-patch or the Apache Commons Text Similarity API, and so on.

    If you need something really heavy-duty, your best bet would probably be Lucene (as also mentioned by Ryan Shillington)

    0 讨论(0)
提交回复
热议问题