split a string at comma but avoid escaped comma and backslash

为君一笑 提交于 2019-12-02 18:06:50

问题


I'd like to split a string at comma ",". The string contains escaped commas "\," and escaped backslashs "\\". Commas at the beginning and end as well as several commas in a row should lead to empty strings.

So ",,\,\\,," should become "", "", "\,\\", "", ""

Note that my example strings show backslash as single "\". Java strings would have them doubled.

I tried with several packages but had no success. My last idea would be to write my own parser.


回答1:


In this case a custom function sounds better for me. Try this:

public String[] splitEscapedString(String s) {
    //Character that won't appear in the string.
    //If you are reading lines, '\n' should work fine since it will never appear.
    String c = "\n";
    StringBuilder sb = new StringBuilder();
    for(int i = 0;i<s.length();++i){
        if(s.charAt(i)=='\\') {
            //If the String is well formatted(all '\' are followed by a character),
            //this line should not have problem.
            sb.append(s.charAt(++i));                
        }
        else {
            if(s.charAt(i) == ',') {
                sb.append(c);
            }
            else {
                sb.append(s.charAt(i));
            }
        }
    }
    return sb.toString().split(c);
}



回答2:


Don't use .split() but find all matches between (unescaped) commas:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile(
    "(?:         # Start of group\n" +
    " \\\\.      # Match either an escaped character\n" +
    "|           # or\n" +
    " [^\\\\,]++ # Match one or more characters except comma/backslash\n" +
    ")*          # Do this any number of times", 
    Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
} 

Result: ["", "", "\\,\\\\", "", ""]

I used a possessive quantifier (++) in order to avoid excessive backtracking due to the nested quantifiers.




回答3:


While certainly a dedicated library is a good idea the following will work

    public static String[] splitValues(final String input) {
        final ArrayList<String> result = new ArrayList<String>();
        // (?:\\\\)* matches any number of \-pairs
        // (?<!\\) ensures that the \-pairs aren't preceded by a single \
        final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
        final Matcher matcher = pattern.matcher(input);
        int previous = 0;
        while (matcher.find()) {
            result.add(input.substring(previous, matcher.end() - 1));
            previous = matcher.end();
        }
        result.add(input.substring(previous, input.length()));
        return result.toArray(new String[result.size()]);
    }

Idea is to find , prefixed by no or even-numbered \ (i.e. not escaped ,) and as the , is the last part of the pattern cut at end()-1 which is just before the ,.

Function is tested against most odds I can think of except for null-input. If you like handling List<String> better you can of course change the return; I just adopted the pattern implemented in split() to handle escapes.

Example class uitilizing this function:

import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Print {
    public static void main(final String[] args) {
        String input = ",,\\,\\\\,,";
        final String[] strings = splitValues(input);
        System.out.print("\""+input+"\" => ");
        printQuoted(strings);
    }

    public static String[] splitValues(final String input) {
        final ArrayList<String> result = new ArrayList<String>();
        // (?:\\\\)* matches any number of \-pairs
        // (?<!\\) ensures that the \-pairs aren't preceded by a single \
        final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
        final Matcher matcher = pattern.matcher(input);
        int previous = 0;
        while (matcher.find()) {
            result.add(input.substring(previous, matcher.end() - 1));
            previous = matcher.end();
        }
        result.add(input.substring(previous, input.length()));
        return result.toArray(new String[result.size()]);
    }

    public static void printQuoted(final String[] strings) {
        if (strings.length > 0) {
            System.out.print("[\"");
            System.out.print(strings[0]);
            for(int i = 1; i < strings.length; i++) {
                System.out.print("\", \"");
                System.out.print(strings[i]);
            }
            System.out.println("\"]");
        } else {
            System.out.println("[]");
        }
    }
}


来源:https://stackoverflow.com/questions/21698185/split-a-string-at-comma-but-avoid-escaped-comma-and-backslash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!