Java: splitting a comma-separated string but ignoring commas in quotes

前端 未结 11 1717
广开言路
广开言路 2020-11-21 05:16

I have a string vaguely like this:

foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"

that I want to split by commas -- but I need to igno

11条回答
  •  耶瑟儿~
    2020-11-21 06:04

    The simplest approach is not to match delimiters, i.e. commas, with a complex additional logic to match what is actually intended (the data which might be quoted strings), just to exclude false delimiters, but rather match the intended data in the first place.

    The pattern consists of two alternatives, a quoted string ("[^"]*" or ".*?") or everything up to the next comma ([^,]+). To support empty cells, we have to allow the unquoted item to be empty and to consume the next comma, if any, and use the \\G anchor:

    Pattern p = Pattern.compile("\\G\"(.*?)\",?|([^,]*),?");
    

    The pattern also contains two capturing groups to get either, the quoted string’s content or the plain content.

    Then, with Java 9, we can get an array as

    String[] a = p.matcher(input).results()
        .map(m -> m.group(m.start(1)<0? 2: 1))
        .toArray(String[]::new);
    

    whereas older Java versions need a loop like

    for(Matcher m = p.matcher(input); m.find(); ) {
        String token = m.group(m.start(1)<0? 2: 1);
        System.out.println("found: "+token);
    }
    

    Adding the items to a List or an array is left as an excise to the reader.

    For Java 8, you can use the results() implementation of this answer, to do it like the Java 9 solution.

    For mixed content with embedded strings, like in the question, you can simply use

    Pattern p = Pattern.compile("\\G((\"(.*?)\"|[^,])*),?");
    

    But then, the strings are kept in their quoted form.

提交回复
热议问题