How to check if a String contains another String in a case insensitive manner in Java?

前端 未结 19 1638
渐次进展
渐次进展 2020-11-22 03:20

Say I have two strings,

String s1 = \"AbBaCca\";
String s2 = \"bac\";

I want to perform a check returning that s2 is contained

19条回答
  •  迷失自我
    2020-11-22 04:09

    A Faster Implementation: Utilizing String.regionMatches()

    Using regexp can be relatively slow. It (being slow) doesn't matter if you just want to check in one case. But if you have an array or a collection of thousands or hundreds of thousands of strings, things can get pretty slow.

    The presented solution below doesn't use regular expressions nor toLowerCase() (which is also slow because it creates another strings and just throws them away after the check).

    The solution builds on the String.regionMatches() method which seems to be unknown. It checks if 2 String regions match, but what's important is that it also has an overload with a handy ignoreCase parameter.

    public static boolean containsIgnoreCase(String src, String what) {
        final int length = what.length();
        if (length == 0)
            return true; // Empty string is contained
    
        final char firstLo = Character.toLowerCase(what.charAt(0));
        final char firstUp = Character.toUpperCase(what.charAt(0));
    
        for (int i = src.length() - length; i >= 0; i--) {
            // Quick check before calling the more expensive regionMatches() method:
            final char ch = src.charAt(i);
            if (ch != firstLo && ch != firstUp)
                continue;
    
            if (src.regionMatches(true, i, what, 0, length))
                return true;
        }
    
        return false;
    }
    

    Speed Analysis

    This speed analysis does not mean to be rocket science, just a rough picture of how fast the different methods are.

    I compare 5 methods.

    1. Our containsIgnoreCase() method.
    2. By converting both strings to lower-case and call String.contains().
    3. By converting source string to lower-case and call String.contains() with the pre-cached, lower-cased substring. This solution is already not as flexible because it tests a predefiend substring.
    4. Using regular expression (the accepted answer Pattern.compile().matcher().find()...)
    5. Using regular expression but with pre-created and cached Pattern. This solution is already not as flexible because it tests a predefined substring.

    Results (by calling the method 10 million times):

    1. Our method: 670 ms
    2. 2x toLowerCase() and contains(): 2829 ms
    3. 1x toLowerCase() and contains() with cached substring: 2446 ms
    4. Regexp: 7180 ms
    5. Regexp with cached Pattern: 1845 ms

    Results in a table:

                                                RELATIVE SPEED   1/RELATIVE SPEED
     METHOD                          EXEC TIME    TO SLOWEST      TO FASTEST (#1)
    ------------------------------------------------------------------------------
     1. Using regionMatches()          670 ms       10.7x            1.0x
     2. 2x lowercase+contains         2829 ms        2.5x            4.2x
     3. 1x lowercase+contains cache   2446 ms        2.9x            3.7x
     4. Regexp                        7180 ms        1.0x           10.7x
     5. Regexp+cached pattern         1845 ms        3.9x            2.8x
    

    Our method is 4x faster compared to lowercasing and using contains(), 10x faster compared to using regular expressions and also 3x faster even if the Pattern is pre-cached (and losing flexibility of checking for an arbitrary substring).


    Analysis Test Code

    If you're interested how the analysis was performed, here is the complete runnable application:

    import java.util.regex.Pattern;
    
    public class ContainsAnalysis {
    
        // Case 1 utilizing String.regionMatches()
        public static boolean containsIgnoreCase(String src, String what) {
            final int length = what.length();
            if (length == 0)
                return true; // Empty string is contained
    
            final char firstLo = Character.toLowerCase(what.charAt(0));
            final char firstUp = Character.toUpperCase(what.charAt(0));
    
            for (int i = src.length() - length; i >= 0; i--) {
                // Quick check before calling the more expensive regionMatches()
                // method:
                final char ch = src.charAt(i);
                if (ch != firstLo && ch != firstUp)
                    continue;
    
                if (src.regionMatches(true, i, what, 0, length))
                    return true;
            }
    
            return false;
        }
    
        // Case 2 with 2x toLowerCase() and contains()
        public static boolean containsConverting(String src, String what) {
            return src.toLowerCase().contains(what.toLowerCase());
        }
    
        // The cached substring for case 3
        private static final String S = "i am".toLowerCase();
    
        // Case 3 with pre-cached substring and 1x toLowerCase() and contains()
        public static boolean containsConverting(String src) {
            return src.toLowerCase().contains(S);
        }
    
        // Case 4 with regexp
        public static boolean containsIgnoreCaseRegexp(String src, String what) {
            return Pattern.compile(Pattern.quote(what), Pattern.CASE_INSENSITIVE)
                        .matcher(src).find();
        }
    
        // The cached pattern for case 5
        private static final Pattern P = Pattern.compile(
                Pattern.quote("i am"), Pattern.CASE_INSENSITIVE);
    
        // Case 5 with pre-cached Pattern
        public static boolean containsIgnoreCaseRegexp(String src) {
            return P.matcher(src).find();
        }
    
        // Main method: perfroms speed analysis on different contains methods
        // (case ignored)
        public static void main(String[] args) throws Exception {
            final String src = "Hi, I am Adam";
            final String what = "i am";
    
            long start, end;
            final int N = 10_000_000;
    
            start = System.nanoTime();
            for (int i = 0; i < N; i++)
                containsIgnoreCase(src, what);
            end = System.nanoTime();
            System.out.println("Case 1 took " + ((end - start) / 1000000) + "ms");
    
            start = System.nanoTime();
            for (int i = 0; i < N; i++)
                containsConverting(src, what);
            end = System.nanoTime();
            System.out.println("Case 2 took " + ((end - start) / 1000000) + "ms");
    
            start = System.nanoTime();
            for (int i = 0; i < N; i++)
                containsConverting(src);
            end = System.nanoTime();
            System.out.println("Case 3 took " + ((end - start) / 1000000) + "ms");
    
            start = System.nanoTime();
            for (int i = 0; i < N; i++)
                containsIgnoreCaseRegexp(src, what);
            end = System.nanoTime();
            System.out.println("Case 4 took " + ((end - start) / 1000000) + "ms");
    
            start = System.nanoTime();
            for (int i = 0; i < N; i++)
                containsIgnoreCaseRegexp(src);
            end = System.nanoTime();
            System.out.println("Case 5 took " + ((end - start) / 1000000) + "ms");
        }
    
    }
    

提交回复
热议问题