Simple JSON string escape for C++?

前端 未结 3 1962
一生所求
一生所求 2020-12-13 02:28

I\'m having a very simple program that outputs simple JSON string that I manually concatenate together and output through the std::cout stream (the output really is that sim

3条回答
  •  余生分开走
    2020-12-13 02:49

    Caveat

    Whatever solution you take, keep in mind that the JSON standard requires that you escape all control characters. This seems to be a common misconception. Many developers get that wrong.

    All control characters means everything from '\x00' to '\x1f', not just those with a short representation such as '\x0a' (also known as '\n'). For example, you must escape the '\x02' character as \u0002.

    See also: ECMA-404 The JSON Data Interchange Format, Page 10

    Simple solution

    If you know for sure that your input string is UTF-8 encoded, you can keep things simple.

    Since JSON allows you to escape everything via \uXXXX, even " and \, a simple solution is:

    #include 
    #include 
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            if (*c == '"' || *c == '\\' || ('\x00' <= *c && *c <= '\x1f')) {
                o << "\\u"
                  << std::hex << std::setw(4) << std::setfill('0') << (int)*c;
            } else {
                o << *c;
            }
        }
        return o.str();
    }
    

    Shortest representation

    For the shortest representation you may use JSON shortcuts, such as \" instead of \u0022. The following function produces the shortest JSON representation of a UTF-8 encoded string s:

    #include 
    #include 
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            switch (*c) {
            case '"': o << "\\\""; break;
            case '\\': o << "\\\\"; break;
            case '\b': o << "\\b"; break;
            case '\f': o << "\\f"; break;
            case '\n': o << "\\n"; break;
            case '\r': o << "\\r"; break;
            case '\t': o << "\\t"; break;
            default:
                if ('\x00' <= *c && *c <= '\x1f') {
                    o << "\\u"
                      << std::hex << std::setw(4) << std::setfill('0') << (int)*c;
                } else {
                    o << *c;
                }
            }
        }
        return o.str();
    }
    

    Pure switch statement

    It is also possible to get along with a pure switch statement, that is, without if and . While this is quite cumbersome, it may be preferable from a "security by simplicity and purity" point of view:

    #include 
    
    std::string escape_json(const std::string &s) {
        std::ostringstream o;
        for (auto c = s.cbegin(); c != s.cend(); c++) {
            switch (*c) {
            case '\x00': o << "\\u0000"; break;
            case '\x01': o << "\\u0001"; break;
            ...
            case '\x0a': o << "\\n"; break;
            ...
            case '\x1f': o << "\\u001f"; break;
            case '\x22': o << "\\\""; break;
            case '\x5c': o << "\\\\"; break;
            default: o << *c;
            }
        }
        return o.str();
    }
    

    Using a library

    You might want to have a look at https://github.com/nlohmann/json, which is an efficient header-only C++ library (MIT License) that seems to be very well-tested.

    You can either call their escape_string() method directly, or you can take their implementation of escape_string() as a starting point for your own implementation:

    https://github.com/nlohmann/json/blob/ec7a1d834773f9fee90d8ae908a0c9933c5646fc/src/json.hpp#L4604-L4697

提交回复
热议问题