问题
I'd like to split a string at comma ","
. The string contains escaped commas "\,"
and escaped backslashs "\\"
. Commas at the beginning and end as well as several commas in a row should lead to empty strings.
So ",,\,\\,,"
should become ""
, ""
, "\,\\"
, ""
, ""
Note that my example strings show backslash as single "\"
. Java strings would have them doubled.
I tried with several packages but had no success. My last idea would be to write my own parser.
回答1:
In this case a custom function sounds better for me. Try this:
public String[] splitEscapedString(String s) {
//Character that won't appear in the string.
//If you are reading lines, '\n' should work fine since it will never appear.
String c = "\n";
StringBuilder sb = new StringBuilder();
for(int i = 0;i<s.length();++i){
if(s.charAt(i)=='\\') {
//If the String is well formatted(all '\' are followed by a character),
//this line should not have problem.
sb.append(s.charAt(++i));
}
else {
if(s.charAt(i) == ',') {
sb.append(c);
}
else {
sb.append(s.charAt(i));
}
}
}
return sb.toString().split(c);
}
回答2:
Don't use .split()
but find all matches between (unescaped) commas:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile(
"(?: # Start of group\n" +
" \\\\. # Match either an escaped character\n" +
"| # or\n" +
" [^\\\\,]++ # Match one or more characters except comma/backslash\n" +
")* # Do this any number of times",
Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
Result: ["", "", "\\,\\\\", "", ""]
I used a possessive quantifier (++
) in order to avoid excessive backtracking due to the nested quantifiers.
回答3:
While certainly a dedicated library is a good idea the following will work
public static String[] splitValues(final String input) {
final ArrayList<String> result = new ArrayList<String>();
// (?:\\\\)* matches any number of \-pairs
// (?<!\\) ensures that the \-pairs aren't preceded by a single \
final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
final Matcher matcher = pattern.matcher(input);
int previous = 0;
while (matcher.find()) {
result.add(input.substring(previous, matcher.end() - 1));
previous = matcher.end();
}
result.add(input.substring(previous, input.length()));
return result.toArray(new String[result.size()]);
}
Idea is to find ,
prefixed by no or even-numbered \
(i.e. not escaped ,
) and as the ,
is the last part of the pattern cut at end()-1
which is just before the ,
.
Function is tested against most odds I can think of except for null
-input. If you like handling List<String>
better you can of course change the return; I just adopted the pattern implemented in split()
to handle escapes.
Example class uitilizing this function:
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Print {
public static void main(final String[] args) {
String input = ",,\\,\\\\,,";
final String[] strings = splitValues(input);
System.out.print("\""+input+"\" => ");
printQuoted(strings);
}
public static String[] splitValues(final String input) {
final ArrayList<String> result = new ArrayList<String>();
// (?:\\\\)* matches any number of \-pairs
// (?<!\\) ensures that the \-pairs aren't preceded by a single \
final Pattern pattern = Pattern.compile("(?<!\\\\)(?:\\\\\\\\)*,");
final Matcher matcher = pattern.matcher(input);
int previous = 0;
while (matcher.find()) {
result.add(input.substring(previous, matcher.end() - 1));
previous = matcher.end();
}
result.add(input.substring(previous, input.length()));
return result.toArray(new String[result.size()]);
}
public static void printQuoted(final String[] strings) {
if (strings.length > 0) {
System.out.print("[\"");
System.out.print(strings[0]);
for(int i = 1; i < strings.length; i++) {
System.out.print("\", \"");
System.out.print(strings[i]);
}
System.out.println("\"]");
} else {
System.out.println("[]");
}
}
}
来源:https://stackoverflow.com/questions/21698185/split-a-string-at-comma-but-avoid-escaped-comma-and-backslash