I am comparing two lists of strings to find possible matches. Example:
public class Tester {
public static void main(String[] args) {
List
Instead of
s.matches(".*" + s2 + ".*")
you can use
s.contains(s2)
or
s.indexOf(s2) > -1
I tested both, each is about 35x faster than matches
.
I think that you shouldn't use regex for that: I believe that looking into String#contains
(here is a link to its javadoc entry) would give you better results, in terms of performance ;)
For example, your code could be:
for(final String s2: test2){
for (final String s: test){
if(s.contains(s2)) {
System.out.println("Match");
}
}
}
You absolutely should be creating a single Matcher
object in this situation, and using that single object in every loop iteration. You are currently creating a new matcher (and compiling a new Pattern
) in each loop iteration.
At the top of your code, do this:
//"": Unused to-search string, so the matcher object can be reused
Matcher mtchr = Pattern.compile(".*" + s2 + ".*").matcher("");
Then in your loop, do this:
if(mtchr.reset(s).matches()) {
...
But I'll agree with @maaartinus here, and say that, given your requirements, you don't need regex at all, and can instead use indexOf(s)
, or even better, contains(s)
, as you don't seem to need the resulting index.
Regardless, this concept of reusing a matcher is invaluable.
IMHO methods like String.matches(String)
should be forbidden. Maybe you need a regex match, maybe not, but what happens here, is that you string gets compiled into an regex... again and again.
So do yourself a favor and convert then all into regexes via Pattern.compile
and reuse them.
Looking at your ".*" + s2 + ".*"
, I'd bet you need no regex at all. Simply use String.contains
and enjoy the speed.