I want to split a string with a delimiter white space. but it should handle quoted strings intelligently. E.g. for a string like
\"John Smith\" Ted Barry
<
This is my own version, clean up from http://pastebin.com/aZngu65y (posted in the comment). It can take care of Unicode. It will clean up all excessive spaces (even in quote) - this can be good or bad depending on the need. No support for escaped quote.
private static String[] parse(String param) {
String[] output;
param = param.replaceAll("\"", " \" ").trim();
String[] fragments = param.split("\\s+");
int curr = 0;
boolean matched = fragments[curr].matches("[^\"]*");
if (matched) curr++;
for (int i = 1; i < fragments.length; i++) {
if (!matched)
fragments[curr] = fragments[curr] + " " + fragments[i];
if (!fragments[curr].matches("(\"[^\"]*\"|[^\"]*)"))
matched = false;
else {
matched = true;
if (fragments[curr].matches("\"[^\"]*\""))
fragments[curr] = fragments[curr].substring(1, fragments[curr].length() - 1).trim();
if (fragments[curr].length() != 0)
curr++;
if (i + 1 < fragments.length)
fragments[curr] = fragments[i + 1];
}
}
if (matched) {
return Arrays.copyOf(fragments, curr);
}
return null; // Parameter failure (double-quotes do not match up properly).
}
Sample input for comparison:
"sdfskjf" sdfjkhsd "hfrif ehref" "fksdfj sdkfj fkdsjf" sdf sfssd
asjdhj sdf ffhj "fdsf fsdjh"
日本語 中文 "Tiếng Việt" "English"
dsfsd
sdf " s dfs fsd f " sd f fs df fdssf "日本語 中文"
"" "" ""
" sdfsfds " "f fsdf
(2nd line is empty, 3rd line is spaces, last line is malformed). Please judge with your own expected output, since it may varies, but the baseline is that, the 1st case should return [sdfskjf, sdfjkhsd, hfrif ehref, fksdfj sdkfj fkdsjf, sdf, sfssd].