问题
I have a Regex, which is [\\.|\\;|\\?|\\!][\\s]
This is used to split a string. But I don't want it to split . ; ? ! if it is in quotes.
回答1:
I'd not use split but Pattern & Matcher instead.
A demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
    public static void main(String[] args) {
        String text = "start. \"in quotes!\"; foo? \"more \\\" words\"; bar";
        String simpleToken = "[^.;?!\\s\"]+";
        String quotedToken =
                "(?x)             # enable inline comments and ignore white spaces in the regex         \n" +
                "\"               # match a double quote                                                \n" +
                "(                # open group 1                                                        \n" +
                "  \\\\.          #   match a backslash followed by any char (other than line breaks)   \n" +
                "  |              #   OR                                                                \n" +
                "  [^\\\\\r\n\"]  #   any character other than a backslash, line breaks or double quote \n" +
                ")                # close group 1                                                       \n" +
                "*                # repeat group 1 zero or more times                                   \n" +
                "\"               # match a double quote                                                \n";
        String regex = quotedToken + "|" + simpleToken;
        Matcher m = Pattern.compile(regex).matcher(text);
        while(m.find()) {
            System.out.println("> " + m.group());
        }
    }
}
which produces:
> start
> "in quotes!"
> foo
> "more \" words"
> bar
As you can see, it can also handle escaped quotes inside quoted tokens.
回答2:
Here is what I do in order to ignore quotes in matches.
(?:[^\"\']|(?:\".*?\")|(?:\'.*?\'))*?    # <-- append the query you wanted to search for - don't use something greedy like .* in the rest of your regex.
To adapt this for your regex, you could do
(?:[^\"\']|(?:\".*?\")|(?:\'.*?\'))*?[.;?!]\s*
来源:https://stackoverflow.com/questions/4917932/regex-to-ignore-text-between-quotes