Simple JSON string escape for C++?

前端 未结 3 1952
一生所求
一生所求 2020-12-13 02:28

I\'m having a very simple program that outputs simple JSON string that I manually concatenate together and output through the std::cout stream (the output really is that sim

相关标签:
3条回答
  • 2020-12-13 02:37

    You didn't say exactly where those strings you're cobbling together are coming from, originally, so this may not be of any use. But if they all happen to live in the code, as @isnullxbh mentioned in this comment to an answer on a different question, another option is to leverage a lovely C++11 feature: Raw string literals.

    I won't quote cppreference's long-winded, standards-based explanation, you can read it yourself there. Basically, though, R-strings bring to C++ the same sort of programmer-delimited literals, with absolutely no restrictions on content, that you get from here-docs in the shell, and which languages like Perl use so effectively. (Prefixed quoting using curly braces may be Perl's single greatest invention:)

    my qstring = q{Quoted 'string'!};
    my qqstring = qq{Double "quoted" 'string'!};
    my replacedstring = q{Regexps that /totally/! get eaten by your parser.};
    replacedstring =~ s{/totally/!}{(won't!)}; 
    # Heh. I see the syntax highlighter isn't quite up to the challege, though.
    

    In C++11 or later, a raw string literal is prefixed with a capital R before the double quotes, and inside the quotes the string is preceded by a free-form delimiter (one or multiple characters) followed by an opening paren.

    From there on, you can safely write literally anything other than a closing paren followed by your chosen delimiter. That sequence (followed by a closing double quote) terminates the raw literal, and then you have a std::string that you can confidently trust will remain unmolested by any parsing or string processing.

    "Raw"-ness is not lost in subsequent manipulations, either. So, borrowing from the chapter list for Crockford's How JavaScript Works, this is completely valid:

    std::string ch0_to_4 = R"json(
    [
        {"number": 0, "chapter": "Read Me First!"},
        {"number": 1, "chapter": "How Names Work"},
        {"number": 2, "chapter": "How Numbers Work"},
        {"number": 3, "chapter": "How Big Integers Work"},
        {"number": 4, "chapter": "How Big Floating Point Works"},)json";
    
    std::string ch5_and_6 = R"json(
        {"number": 5, "chapter": "How Big Rationals Work"},
        {"number": 6, "chapter": "How Booleans Work"})json";
    
    std::string chapters = ch0_to_4 + ch5_and_6 + "\n]";
    std::cout << chapters;
    

    The string 'chapters' will emerge from std::cout completely intact:

    [
        {"number": 0, "chapter": "Read Me First!"},
        {"number": 1, "chapter": "How Names Work"},
        {"number": 2, "chapter": "How Numbers Work"},
        {"number": 3, "chapter": "How Big Integers Work"},
        {"number": 4, "chapter": "How Big Floating Point Works"},
        {"number": 5, "chapter": "How Big Rationals Work"},
        {"number": 6, "chapter": "How Booleans Work"}
    ]
    
    0 讨论(0)
  • 2020-12-13 02:41

    I have written a simple JSON escape and unescaped functions. The code is public available in GitHub. For anyone interested here is the code:

    enum State {ESCAPED, UNESCAPED};
    
    std::string escapeJSON(const std::string& input)
    {
        std::string output;
        output.reserve(input.length());
    
        for (std::string::size_type i = 0; i < input.length(); ++i)
        {
            switch (input[i]) {
                case '"':
                    output += "\\\"";
                    break;
                case '/':
                    output += "\\/";
                    break;
                case '\b':
                    output += "\\b";
                    break;
                case '\f':
                    output += "\\f";
                    break;
                case '\n':
                    output += "\\n";
                    break;
                case '\r':
                    output += "\\r";
                    break;
                case '\t':
                    output += "\\t";
                    break;
                case '\\':
                    output += "\\\\";
                    break;
                default:
                    output += input[i];
                    break;
            }
    
        }
    
        return output;
    }
    
    std::string unescapeJSON(const std::string& input)
    {
        State s = UNESCAPED;
        std::string output;
        output.reserve(input.length());
    
        for (std::string::size_type i = 0; i < input.length(); ++i)
        {
            switch(s)
            {
                case ESCAPED:
                    {
                        switch(input[i])
                        {
                            case '"':
                                output += '\"';
                                break;
                            case '/':
                                output += '/';
                                break;
                            case 'b':
                                output += '\b';
                                break;
                            case 'f':
                                output += '\f';
                                break;
                            case 'n':
                                output += '\n';
                                break;
                            case 'r':
                                output += '\r';
                                break;
                            case 't':
                                output += '\t';
                                break;
                            case '\\':
                                output += '\\';
                                break;
                            default:
                                output += input[i];
                                break;
                        }
    
                        s = UNESCAPED;
                        break;
                    }
                case UNESCAPED:
                    {
                        switch(input[i])
                        {
                            case '\\':
                                s = ESCAPED;
                                break;
                            default:
                                output += input[i];
                                break;
                        }
                    }
            }
        }
        return output;
    }
    
    0 讨论(0)
  • 2020-12-13 02:49

    Caveat

    Whatever solution you take, keep in mind that the JSON standard requires that you escape all control characters. This seems to be a common misconception. Many developers get that wrong.

    All control characters means everything from '\x00' to '\x1f', not just those with a short representation such as '\x0a' (also known as '\n'). For example, you must escape the '\x02' character as \u0002.

    See also: ECMA-404 The JSON Data Interchange Format, Page 10

    Simple solution

    If you know for sure that your input string is UTF-8 encoded, you can keep things simple.

    Since JSON allows you to escape everything via \uXXXX, even " and \, a simple solution is:

    #include <sstream>
    #include <iomanip>
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            if (*c == '"' || *c == '\\' || ('\x00' <= *c && *c <= '\x1f')) {
                o << "\\u"
                  << std::hex << std::setw(4) << std::setfill('0') << (int)*c;
            } else {
                o << *c;
            }
        }
        return o.str();
    }
    

    Shortest representation

    For the shortest representation you may use JSON shortcuts, such as \" instead of \u0022. The following function produces the shortest JSON representation of a UTF-8 encoded string s:

    #include <sstream>
    #include <iomanip>
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            switch (*c) {
            case '"': o << "\\\""; break;
            case '\\': o << "\\\\"; break;
            case '\b': o << "\\b"; break;
            case '\f': o << "\\f"; break;
            case '\n': o << "\\n"; break;
            case '\r': o << "\\r"; break;
            case '\t': o << "\\t"; break;
            default:
                if ('\x00' <= *c && *c <= '\x1f') {
                    o << "\\u"
                      << std::hex << std::setw(4) << std::setfill('0') << (int)*c;
                } else {
                    o << *c;
                }
            }
        }
        return o.str();
    }
    

    Pure switch statement

    It is also possible to get along with a pure switch statement, that is, without if and <iomanip>. While this is quite cumbersome, it may be preferable from a "security by simplicity and purity" point of view:

    #include <sstream>
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            switch (*c) {
            case '\x00': o << "\\u0000"; break;
            case '\x01': o << "\\u0001"; break;
            ...
            case '\x0a': o << "\\n"; break;
            ...
            case '\x1f': o << "\\u001f"; break;
            case '\x22': o << "\\\""; break;
            case '\x5c': o << "\\\\"; break;
            default: o << *c;
            }
        }
        return o.str();
    }
    

    Using a library

    You might want to have a look at https://github.com/nlohmann/json, which is an efficient header-only C++ library (MIT License) that seems to be very well-tested.

    You can either call their escape_string() method directly, or you can take their implementation of escape_string() as a starting point for your own implementation:

    https://github.com/nlohmann/json/blob/ec7a1d834773f9fee90d8ae908a0c9933c5646fc/src/json.hpp#L4604-L4697

    0 讨论(0)
提交回复
热议问题