How do I regex remove whitespace and newlines from a text, except for when they are in a json's string?

三世轮回 提交于 2021-01-29 11:11:10

问题


I have an instruction like:

db.insert( {
    _id:3,
    cost:{_0:11},
    description:"This is a description.\nCool, isn\'t it?"
});

The Eclipse plugin I am using, called MonjaDB splits the instruction by newline and I get each line as a separate instruction, which is bad. I fixed it using ;(\r|\n)+ which now includes the entire instruction, however, when sanitizing the newlines between the parts of the JSON, it also sanitizes the \n and \r within string in the json itself.

How do I avoid removing \t, \r, \n from within JSON strings? which are, of course, delimited by "" or ''.


回答1:


You need to arrange to ignore whitespace when it appears within quotes,. So as suggested by one of the commenters:

\s+ | ( "  (?: [^"\\]  |  \\ . ) * " )              // White-space inserted for readability

Match java whitespace or a double-quoted string where a string consists of " followed by any non-escape, non-quote or an escape + plus any character, then a final ". This way, whitespaces inside strings are not matched.

and replace with $1 if $1 is not null.

    Pattern clean = Pattern.compile(" \\s+ | ( \" (?: [^\"\\\\] | \\\\ . ) * \" ) ", Pattern.COMMENTS | Pattern.DOTALL);

StringBuffer sb = new StringBuffer();
Matcher m = clean.matcher( json );
while (m.find()) {
    m.appendReplacement(sb, "" );
    // Don't put m.group(1) in the appendReplacement because if it happens to contain $1 or $2 you'll get an error.
    if ( m.group(1) != null )
        sb.append( m.group(1) );
}
m.appendTail(sb);

String cleanJson = sb.toString();

This is totally off the top of my head but I'm pretty sure it's close to what you want.

Edit: I've just got access to a Java IDE and tried out my solution. I had made a couple of mistakes with my code including using \. instead of . in the Pattern. So I have fixed that up and run it on a variation of your sample:

db.insert( {
    _id:3,
    cost:{_0:11},
    description:"This is a \"description\" with an embedded newline: \"\n\".\nCool, isn\'t it?"
});

The code:

    String json = "db.insert( {\n" +
            "    _id:3,\n" +
            "    cost:{_0:11},\n" +
            "    description:\"This is a \\\"description\\\" with an embedded newline: \\\"\\n\\\".\\nCool, isn\\'t it?\"\n" +
            "});";

        // insert above code

        System.out.println(cleanJson);

This produces:

db.insert({_id:3,cost:{_0:11},description:"This is a \"description\" with an embedded newline: \"\n\".\nCool, isn\'t it?"});

which is the same json expression with all whitespace removed outside quoted strings and whitespace and newlines retained inside quoted strings.



来源:https://stackoverflow.com/questions/18203676/how-do-i-regex-remove-whitespace-and-newlines-from-a-text-except-for-when-they

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!