问题
I'm writing a simple debugging program that takes as input simple strings that can contain stars to indicate a wildcard match-any
*.wav // matches <anything>.wav
(*, a) // matches (<anything>, a)
I thought I would simply take that pattern, escape any regular expression special characters in it, then replace any \\*
back to .*
. And then use a regular expression matcher.
But I can't find any Java function to escape a regular expression. The best match I could find is Pattern.quote
, which however just puts \Q
and \E
at the begin and end of the string.
Is there anything in Java that allows you to simply do that wildcard matching without you having to implement the algorithm from scratch?
回答1:
Using A Simple Regex
One of this method's benefits is that we can easily add tokens besides *
(see Adding Tokens at the bottom).
Search: [^*]+|(\*)
- The left side of the
|
matches any chars that are not a star - The right side captures all stars to Group 1
- If Group 1 is empty: replace with
\Q
+ Match +E
- If Group 1 is set: replace with
.*
Here is some working code (see the output of the online demo).
Input: audio*2012*.wav
Output: \Qaudio\E.*\Q2012\E.*\Q.wav\E
String subject = "audio*2012*.wav";
Pattern regex = Pattern.compile("[^*]+|(\\*)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, ".*");
else m.appendReplacement(b, "\\\\Q" + m.group(0) + "\\\\E");
}
m.appendTail(b);
String replaced = b.toString();
System.out.println(replaced);
Adding Tokens
Suppose we also want to convert the wildcard ?
, which stands for a single character, by a dot. We just add a capture group to the regex, and exclude it from the matchall on the left:
Search: [^*?]+|(\*)|(\?)
In the replace function we the add something like:
else if(m.group(2) != null) m.appendReplacement(b, ".");
回答2:
Just escape everything - no harm will come of it.
String input = "*.wav";
String regex = ("\\Q" + input + "\\E").replace("*", "\\E.*\\Q");
System.out.println(regex); // \Q\E.*\Q.wav\E
System.out.println("abcd.wav".matches(regex)); // true
Or you can use character classes:
String input = "*.wav";
String regex = input.replaceAll(".", "[$0]").replace("[*]", ".*");
System.out.println(regex); // .*[.][w][a][v]
System.out.println("abcd.wav".matches(regex)); // true
It's easier to "escape" the characters by putting them in a character class, as almost all characters lose any special meaning when in a character class. Unless you're expecting weird file names, this will work.
回答3:
There is small utility method in Apache Commons-IO library: org.apache.commons.io.FilenameUtils#wildcardMatch(), which you can use without intricacies of the regular expression.
API documentation could be found in: https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,%20java.lang.String)
回答4:
You can also use the Quotation escape characters: \\Q and \\E
- everything between them is treated as literal and not considered to be part of the regex to be evaluated. Thus this code should work:
String input = "*.wav";
String regex = "\\Q" + input.replace("*", "\\E.*?\\Q") + "\\E";
// regex = "\\Q\\E.*?\\Q.wav\\E"
Note that your * wildcard might also be best matched only against word characters using \w depending on how you want your wildcard to behave(?)
回答5:
Lucene has classes that provide this capability, with additional support for backslash as an escape character. ?
matches a single character, 1
matches 0 or more characters, \
escapes the following character. Supports Unicode code points. Supposed to be fast but I haven't tested.
CharacterRunAutomaton characterRunAutomaton;
boolean matches;
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Walmart")));
matches = characterRunAutomaton.run("Walmart"); // true
matches = characterRunAutomaton.run("Wal*mart"); // false
matches = characterRunAutomaton.run("Wal\\*mart"); // false
matches = characterRunAutomaton.run("Waldomart"); // false
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal*mart")));
matches = characterRunAutomaton.run("Walmart"); // true
matches = characterRunAutomaton.run("Wal*mart"); // true
matches = characterRunAutomaton.run("Wal\\*mart"); // true
matches = characterRunAutomaton.run("Waldomart"); // true
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal\\*mart")));
matches = characterRunAutomaton.run("Walmart"); // false
matches = characterRunAutomaton.run("Wal*mart"); // true
matches = characterRunAutomaton.run("Wal\\*mart"); // false
matches = characterRunAutomaton.run("Waldomart"); // false
回答6:
Regex While Accommodating A DOS/Windows Path
Implementing the Quotation escape characters \Q
and \E
is probably the best approach. However, since a backslash is typically used as a DOS/Windows file separator, a "\E
" sequence within the path could effect the pairing of \Q
and \E
. While accounting for the *
and ?
wildcard tokens, this situation of the backslash could be addressed in this manner:
Search: [^*?\\]+|(\*)|(\?)|(\\)
Two new lines would be added in the replace function of the "Using A Simple Regex" example to accommodate the new search pattern. The code would still be "Linux-friendly". As a method, it could be written like this:
public String wildcardToRegex(String wildcardStr) {
Pattern regex=Pattern.compile("[^*?\\\\]+|(\\*)|(\\?)|(\\\\)");
Matcher m=regex.matcher(wildcardStr);
StringBuffer sb=new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(sb, ".*");
else if(m.group(2) != null) m.appendReplacement(sb, ".");
else if(m.group(3) != null) m.appendReplacement(sb, "\\\\\\\\");
else m.appendReplacement(sb, "\\\\Q" + m.group(0) + "\\\\E");
}
m.appendTail(sb);
return sb.toString();
}
Code to demonstrate the implementation of this method could be written like this:
String s = "C:\\Temp\\Extra\\audio??2012*.wav";
System.out.println("Input: "+s);
System.out.println("Output: "+wildcardToRegex(s));
This would be the generated results:
Input: C:\Temp\Extra\audio??2012*.wav
Output: \QC:\E\\\QTemp\E\\\QExtra\E\\\Qaudio\E..\Q2012\E.*\Q.wav\E
来源:https://stackoverflow.com/questions/24337657/wildcard-matching-in-java