I\'m trying to build an efficient string matching algorithm. This will execute in a high-volume environment, so performance is critical.
Here are my requirements:
Assuming the rules are as you said: literal or start with a *.
Java:
public static boolean matches(String candidate, List rules) {
for(String rule : rules) {
if (rule.startsWith("*")) {
rule = rule.substring(2);
}
if (candidate.endsWith(rule)) {
return true;
}
}
return false;
}
This scales to the number of rules you have.
EDIT:
Just to be clear here.
When I say "sort the rules", I really mean create a tree out of the rule characters.
Then you use the match string to try and walk the tree (i.e. if I have a string of xyz, I start with the x character, and see if it has a y branch, and then a z child).
For the "wildcards" I'd use the same concept, but populate it "backwards", and walk it with the back of the match candidate.
If you have a LOT (LOT LOT) of rules I would sort the rules.
For non wildcard matches, you iterate for each character to narrow the possible rules (i.e. if it starts with "w", then you work with the "w" rules, etc.)
If it IS a wildcard match, you do the exact same thing, but you work against a list of "backwards rules", and simply match form the end of the string against the end of the rule.