How to parse an CSV like escaped String with Boost Spirit?

柔情痞子 提交于 2020-05-15 11:11:16

问题


For my express parser project i would like to use CSV like escaping: "" to escape "

Examples:

 "\"hello\"",
 "   \"  hello \"  ",
 "  \"  hello \"\"stranger\"\" \"  ",

online compile&try: https://wandbox.org/permlink/5uchQM8guIN1k7aR

my current parsing rule only parses the first 2 tests

qi::rule<std::string::const_iterator, qi::blank_type, utree()> double_quoted_string
    = '"' >> qi::no_skip[+~qi::char_('"')] >> '"';

i've found this stackoverflow question and one answer is given using spirit:

How can I read and parse CSV files in C++?

start       = field % ',';
field       = escaped | non_escaped;
escaped     = lexeme['"' >> *( char_ -(char_('"') | ',') | COMMA | DDQUOTE)  >> '"'];
non_escaped = lexeme[       *( char_ -(char_('"') | ',')                  )        ];
DDQUOTE     = lit("\"\"")       [_val = '"'];
COMMA       = lit(",")          [_val = ','];

(i don't know how to link answers, so if interesed search for "You gotta feel proud when you use something so beautiful as boost::spirit")

sadly it does not compile for me - and even years of C++ error msg analysis didn't prepared me for spirit error msg floods :) and if i understand it correct the rule will wait for , as a string delimiter, what is maybe not the correct thing for my expression parser project

expression = "strlen( \"hello \"\"you\"\" \" )+1";
expression = "\"hello \"";
expression = "strlen(concat(\"hello\",\"you\")+3";

or do the rule need to wait optionally for , and ) in this case?

i hope i don't ask too many silly questions but the answers help me alot to get into spirit the expression parse itself is nearly working except string escaping

thx for any help

UPDATE: this seems to work for me, at least it parses the strings but removes the escaped " from the string, is there a better debug output available for strings? " " " " "h" "e" "l" "l" "o" " " "s" "t" "r" "a" "n" "g" "e" "r" " " isn't really that readable

qi::rule<std::string::const_iterator, utree()> double_quoted_string
  = qi::lexeme['"' >> *(qi::char_ - (qi::char_('"')) | qi::lit("\"\"")) >> '"'];

回答1:


You can simplify the question down to this. How to make a double-quoted string accept "double double quotes" to escape an embedded double-quote character?

A simple string parser without escapes:

qi::rule<It, std::string()> s = '"' >> *~qi::char_('"') >> '"';

Now, to also accept the single escaped " as desired, simply add:

s = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';

Other notes:

  • in your online example the use of no_skip is sloppy: it would parse "foo bar" and " foo bar " to foo bar (trimming the whitespace).. Instead, drop the skipper from the rule to make it implicitly lexeme (again).
  • Your parser did not accept empty strings (this might be what you want, but that's not certain)
  • using utree is likely complicating your life more than you want

Simplified:

Live On Coliru

#define BOOST_SPIRIT_DEBUG
#include <iostream>
#include <iomanip>
#include <string>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
namespace fu = boost::fusion;

int main()
{
    auto tests = std::vector<std::string>{
         R"( "hello" )",
         R"(    "  hello " )",
         R"(  "  hello ""escaped"" "  )",
    };
    for (const std::string& str : tests) {
        auto iter = str.begin(), end = str.end();

        qi::rule<std::string::const_iterator, std::string()> double_quoted_string
            = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';

        std::string ut;
        bool r = qi::phrase_parse(iter, end, double_quoted_string >> qi::eoi, qi::blank, ut);

        std::cout << str << " ";
        if (r) {
            std::cout << "OK: " << std::quoted(ut, '\'') << "\n";
        }
        else {
            std::cout << "Failed\n";
        }
        if (iter != end) {
            std::cout << "Remaining unparsed: " << std::quoted(std::string(iter, end)) << "\n";
        }
        std::cout << "----\n";
    }
}

Prints

 "hello"  OK: 'hello'
----
    "  hello "  OK: '  hello '
----
  "  hello ""escaped"" "   OK: '  hello "escaped" '
----


来源:https://stackoverflow.com/questions/60826588/how-to-parse-an-csv-like-escaped-string-with-boost-spirit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!