问题
I have an instruction like:
db.insert( {
_id:3,
cost:{_0:11},
description:"This is a description.\nCool, isn\'t it?"
});
The Eclipse plugin I am using, called MonjaDB splits the instruction by newline and I get each line as a separate instruction, which is bad. I fixed it using ;(\r|\n)+ which now includes the entire instruction, however, when sanitizing the newlines between the parts of the JSON, it also sanitizes the \n and \r within string in the json itself.
How do I avoid removing \t, \r, \n from within JSON strings? which are, of course, delimited by "" or ''.
回答1:
You need to arrange to ignore whitespace when it appears within quotes,. So as suggested by one of the commenters:
\s+ | ( " (?: [^"\\] | \\ . ) * " ) // White-space inserted for readability
Match java whitespace or a double-quoted string where a string consists of " followed by any non-escape, non-quote or an escape + plus any character, then a final ". This way, whitespaces inside strings are not matched.
and replace with $1 if $1 is not null.
Pattern clean = Pattern.compile(" \\s+ | ( \" (?: [^\"\\\\] | \\\\ . ) * \" ) ", Pattern.COMMENTS | Pattern.DOTALL);
StringBuffer sb = new StringBuffer();
Matcher m = clean.matcher( json );
while (m.find()) {
m.appendReplacement(sb, "" );
// Don't put m.group(1) in the appendReplacement because if it happens to contain $1 or $2 you'll get an error.
if ( m.group(1) != null )
sb.append( m.group(1) );
}
m.appendTail(sb);
String cleanJson = sb.toString();
This is totally off the top of my head but I'm pretty sure it's close to what you want.
Edit: I've just got access to a Java IDE and tried out my solution. I had made a couple of mistakes with my code including using \. instead of . in the Pattern. So I have fixed that up and run it on a variation of your sample:
db.insert( {
_id:3,
cost:{_0:11},
description:"This is a \"description\" with an embedded newline: \"\n\".\nCool, isn\'t it?"
});
The code:
String json = "db.insert( {\n" +
" _id:3,\n" +
" cost:{_0:11},\n" +
" description:\"This is a \\\"description\\\" with an embedded newline: \\\"\\n\\\".\\nCool, isn\\'t it?\"\n" +
"});";
// insert above code
System.out.println(cleanJson);
This produces:
db.insert({_id:3,cost:{_0:11},description:"This is a \"description\" with an embedded newline: \"\n\".\nCool, isn\'t it?"});
which is the same json expression with all whitespace removed outside quoted strings and whitespace and newlines retained inside quoted strings.
来源:https://stackoverflow.com/questions/18203676/how-do-i-regex-remove-whitespace-and-newlines-from-a-text-except-for-when-they