How to use beginning and endline markers in regex for Java String?

后端 未结 5 2053
没有蜡笔的小新
没有蜡笔的小新 2021-01-04 19:51

Why doesn\'t the following change the text for me in Android?

String content = \"test\\n=test=\\ntest\";
content = content.replaceAll(\"^=(.+)=$\", \"

        
相关标签:
5条回答
  • 2021-01-04 20:27

    Okay, I think I have an answer... this seems to work:

    content = content.replaceAll("(?<!.)=(.+)=(?!.)", "<size:large>$1</size:large>")
    

    Doesn't it?

    Edit: The above doesn't work, for some reason it copies the original text and places it after the substituted text... here's what finally did work:

    content = content.replaceAll("(?<=(\n|^))=(.+)=(?=(\n|$))", "<size:large>$2</size:large>");
    
    0 讨论(0)
  • 2021-01-04 20:35

    In Java String objects \n isn't considered a beginning of a line or an end of a line. It's a line feed. To match this, you need to change your code to

    String content = "test\n=test=\ntest";
    content = content.replaceAll("\n=(.+)=\n", "\n<size:large>$1</size:large>\n");
    

    What ^ and $ match, are the beginning and the end of the String object itself.

    If you're reading from a file, the newline could be a CRLF character in which case you want to match \r too. In that case you need to use a regex like this

    content = content.replaceAll("[\n\r]=(.+)=[\n\r]", "\n<size:large>$1</size:large>\n");
    

    If you need to match to work in multiple instances in multiple 'lines' in the same String, you should first split the String to multiple lines.

    String content = "test\n=test=\n=test=\ntest";
    String[] pieces = content.split("[\r\n]");
    StringBuilder replaced = new StringBuilder();
    
    for (int i=0; i<pieces.length; i++) {
        String piece = pieces[i].replaceAll("^=(.+)=$", "<size:large>$1</size:large>");
        replaced.append(piece);
        replaced.append('\n');
    }
    
    0 讨论(0)
  • 2021-01-04 20:39

    The best way to deal with this is to set Pattern.MULTILINE. Using MULTILINE, ^ and $ will match on lines that are separated using only \n, and will similarly handle the beginning of input and the end of input.

    Using String.replaceAll you need to set these within the pattern using an embedded flag expression (?m), for MULTILINE:

    content = str.replaceAll("(?m)^=(.+)=$", "<size:large>$1</size:large>");
    

    If you don't use MULTILINE, you need to use positive lookahead and lookbehind for the \n, and the regex gets complicated in order to match the first line, and the last line if there's no \n at the end, e.g. if our input is: =test=\n=test=\n=test=\n=test=.

    String pattern = "(?<=(^|\n))=(.+)=(?=(\n|$))";
    content = str.replaceAll(pattern, "<size:large>$2</size:large>");
    

    In this pattern we're supplying options for the lookbehind: \n or beginning of input, (^|\n); and for the lookahead: \n or end of input, (\n|$). Notice that we need to use $2 as the captured group reference in the replacement because of the group introduced by the first or.

    We can make the pattern more complicated by introducing the alternatives in the lookahead/lookbehind in non-capturing groups, which look like (?:):

    String pattern = "(?<=(?:^|\n))=(.+)=(?=(?:\n|$))";
    content = str.replaceAll(pattern, "<size:large>$1</size:large>");
    

    Now we're back to using $1 as the captured group in the replacement.

    0 讨论(0)
  • 2021-01-04 20:40

    Well, as you already stated, it is a matter of begin/end markers of the regex. ^ is not the beginning of a line but of the entire string, likewise $ is the end of the entire string.

    Try to change the expression: ^.*=(.+)=.*$ or leave those markers out: =(.+)=

    To match the beginning of the line, you could use this: (?:^|\n)=(.*)=. From my tests in plain Java, it seems as if the trailing line break is not recognized, but that might be different in Android.

    0 讨论(0)
  • 2021-01-04 20:45

    Your original regex works fine if you turn on multiline mode, using (?m):

    content = content.replaceAll("(?m)^=(.+)=$", "<size:large>$1</size:large>");
    

    Now ^ and $ do indeed match at line boundaries.

    0 讨论(0)
提交回复
热议问题