Wildcard matching in Java

前端 未结 6 1881
渐次进展
渐次进展 2020-12-14 23:13

I\'m writing a simple debugging program that takes as input simple strings that can contain stars to indicate a wildcard match-any

*.wav  // matches 

        
相关标签:
6条回答
  • 2020-12-14 23:17

    You can also use the Quotation escape characters: \\Q and \\E - everything between them is treated as literal and not considered to be part of the regex to be evaluated. Thus this code should work:

        String input = "*.wav";
        String regex = "\\Q" + input.replace("*", "\\E.*?\\Q") + "\\E";
    
        // regex = "\\Q\\E.*?\\Q.wav\\E"
    

    Note that your * wildcard might also be best matched only against word characters using \w depending on how you want your wildcard to behave(?)

    0 讨论(0)
  • 2020-12-14 23:20

    Just escape everything - no harm will come of it.

        String input = "*.wav";
        String regex = ("\\Q" + input + "\\E").replace("*", "\\E.*\\Q");
        System.out.println(regex); // \Q\E.*\Q.wav\E
        System.out.println("abcd.wav".matches(regex)); // true
    

    Or you can use character classes:

        String input = "*.wav";
        String regex = input.replaceAll(".", "[$0]").replace("[*]", ".*");
        System.out.println(regex); // .*[.][w][a][v]
        System.out.println("abcd.wav".matches(regex)); // true
    

    It's easier to "escape" the characters by putting them in a character class, as almost all characters lose any special meaning when in a character class. Unless you're expecting weird file names, this will work.

    0 讨论(0)
  • 2020-12-14 23:33

    Regex While Accommodating A DOS/Windows Path

    Implementing the Quotation escape characters \Q and \E is probably the best approach. However, since a backslash is typically used as a DOS/Windows file separator, a "\E" sequence within the path could effect the pairing of \Q and \E. While accounting for the * and ? wildcard tokens, this situation of the backslash could be addressed in this manner:

    Search: [^*?\\]+|(\*)|(\?)|(\\)

    Two new lines would be added in the replace function of the "Using A Simple Regex" example to accommodate the new search pattern. The code would still be "Linux-friendly". As a method, it could be written like this:

    public String wildcardToRegex(String wildcardStr) {
        Pattern regex=Pattern.compile("[^*?\\\\]+|(\\*)|(\\?)|(\\\\)");
        Matcher m=regex.matcher(wildcardStr);
        StringBuffer sb=new StringBuffer();
        while (m.find()) {
            if(m.group(1) != null) m.appendReplacement(sb, ".*");
            else if(m.group(2) != null) m.appendReplacement(sb, ".");     
            else if(m.group(3) != null) m.appendReplacement(sb, "\\\\\\\\");
            else m.appendReplacement(sb, "\\\\Q" + m.group(0) + "\\\\E");
        }
        m.appendTail(sb);
        return sb.toString();
    }
    

    Code to demonstrate the implementation of this method could be written like this:

    String s = "C:\\Temp\\Extra\\audio??2012*.wav";
    System.out.println("Input: "+s);
    System.out.println("Output: "+wildcardToRegex(s));
    

    This would be the generated results:

    Input: C:\Temp\Extra\audio??2012*.wav
    Output: \QC:\E\\\QTemp\E\\\QExtra\E\\\Qaudio\E..\Q2012\E.*\Q.wav\E
    
    0 讨论(0)
  • 2020-12-14 23:36

    There is small utility method in Apache Commons-IO library: org.apache.commons.io.FilenameUtils#wildcardMatch(), which you can use without intricacies of the regular expression.

    API documentation could be found in: https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,%20java.lang.String)

    0 讨论(0)
  • 2020-12-14 23:41

    Lucene has classes that provide this capability, with additional support for backslash as an escape character. ? matches a single character, 1 matches 0 or more characters, \ escapes the following character. Supports Unicode code points. Supposed to be fast but I haven't tested.

    CharacterRunAutomaton characterRunAutomaton;
    boolean matches;
    characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Walmart")));
    matches = characterRunAutomaton.run("Walmart"); // true
    matches = characterRunAutomaton.run("Wal*mart"); // false
    matches = characterRunAutomaton.run("Wal\\*mart"); // false
    matches = characterRunAutomaton.run("Waldomart"); // false
    characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal*mart")));
    matches = characterRunAutomaton.run("Walmart"); // true
    matches = characterRunAutomaton.run("Wal*mart"); // true
    matches = characterRunAutomaton.run("Wal\\*mart"); // true
    matches = characterRunAutomaton.run("Waldomart"); // true
    characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal\\*mart")));
    matches = characterRunAutomaton.run("Walmart"); // false
    matches = characterRunAutomaton.run("Wal*mart"); // true
    matches = characterRunAutomaton.run("Wal\\*mart"); // false
    matches = characterRunAutomaton.run("Waldomart"); // false
    
    0 讨论(0)
  • 2020-12-14 23:42

    Using A Simple Regex

    One of this method's benefits is that we can easily add tokens besides * (see Adding Tokens at the bottom).

    Search: [^*]+|(\*)

    • The left side of the | matches any chars that are not a star
    • The right side captures all stars to Group 1
    • If Group 1 is empty: replace with \Q + Match + E
    • If Group 1 is set: replace with .*

    Here is some working code (see the output of the online demo).

    Input: audio*2012*.wav

    Output: \Qaudio\E.*\Q2012\E.*\Q.wav\E

    String subject = "audio*2012*.wav";
    Pattern regex = Pattern.compile("[^*]+|(\\*)");
    Matcher m = regex.matcher(subject);
    StringBuffer b= new StringBuffer();
    while (m.find()) {
        if(m.group(1) != null) m.appendReplacement(b, ".*");
        else m.appendReplacement(b, "\\\\Q" + m.group(0) + "\\\\E");
    }
    m.appendTail(b);
    String replaced = b.toString();
    System.out.println(replaced);
    

    Adding Tokens

    Suppose we also want to convert the wildcard ?, which stands for a single character, by a dot. We just add a capture group to the regex, and exclude it from the matchall on the left:

    Search: [^*?]+|(\*)|(\?)

    In the replace function we the add something like:

    else if(m.group(2) != null) m.appendReplacement(b, "."); 
    
    0 讨论(0)
提交回复
热议问题