Parsing Selector struct with alternating tokens using Boost Spirit X3

故事扮演 提交于 2021-02-05 08:28:06

问题


I am trying to parse the following struct:

struct Selector {
    std::string element;
    std::string id;
    std::vector<std::string> classes;
};

This struct is used to parse selectors in the form element#id.class1.class2.classn. These selectors always start with 1 or no elements, could contain 1 or no ids, and could contain 0 to n classes.

This gets even more complicated though, because classes and id can appear in any order, so the following selectors are all valid: element#id.class1, .class1#id.class2.class3, #id.class1.class2, .class1.class2#id. For this reason, I have not been able to use hold[], or at<T>() approaches described here, and I also have not been able to use BOOST_FUSION_ADAPT_STRUCT.

The only way that I have been able to synthesize this struct, is with the following rules:

auto element = [](auto& ctx){x3::_val(ctx).element = x3::_attr(ctx);};
auto id = [](auto& ctx){x3::_val(ctx).id = x3::_attr(ctx);};
auto empty = [](auto& ctx){x3::_val(ctx) = "";};
auto classes = [](auto& ctx){x3::_val(ctx).classes.insert(x3::_val(ctx).classes.end(), x3::_attr(ctx).begin(), x3::_attr(ctx).end());};

auto elementRule = x3::rule<class EmptyIdClass, std::string>() = +x3::char_("a-zA-Z") | x3::attr("");
auto idRule = x3::rule<class EmptyIdClass, std::string>() = ("#" >> +x3::char_("a-zA-Z")) | x3::attr("");
auto classesRule = x3::rule<class ClassesClass, std::vector<std::string>>() = *("." >> +x3::char_("a-zA-Z"));
auto selectorRule = x3::rule<class TestClass, Selector>() = elementRule[element] >> classesRule[classes] >> idRule[id] >> classesRule[classes];

What would be the best way to parse this struct? Is it possible to synthesize this selector struct naturally, using BOOST_FUSION_ADAPT_STRUCT, and without semantic actions?

It seems like everytime I think I am am getting the hang of Spirit X3, I stumble upon a new challenge. In this particular case, I learned about issues with backtracking, about an issue with using at<T>() that was introduced in Boost 1.70 here, and I also learned that hold[] is not supported by X3.


回答1:


I've written similar answers before:

  • Parsing CSS with Boost.Spirit X3 (a treasure trove for more complete CSS parsing in both Qi and X3)
  • Using boost::spirit to parse named parameters in any order (Qi and X3 in the comments)
  • Boost Spirit x3: parse into structs
  • Combining rules at runtime and returning rules

I don't think you can directly fusion-adapt. Although if you are very motivated (e.g. you already have the adapted structs) you could make some generic helpers off that.

To be fair, a little bit of restructuring in your code seems pretty nice to me, already. Here's my effort to make it more elegant/convenient. I'll introduce a helper macro just like BOOST_FUSION_ADAPT_XXX, but not requiring any Boost Fusion.

Let's Start With The AST

As always, I like to start with the basics. Understanding the goal is half the battle:

namespace Ast {
    using boost::optional;

    struct Selector {
        // These selectors always 
        //  - start with 1 or no elements, 
        //  - could contain 1 or no ids, and
        //  - could contain 0 to n classes.
        optional<std::string> element;
        optional<std::string> id;
        std::vector<std::string> classes;

        friend std::ostream& operator<<(std::ostream& os, Selector const&s) {
            if  (s.element.has_value()) os << s.element.value();
            if  (s.id.has_value())      os << "#" << s.id.value();
            for (auto& c : s.classes)   os << "." << c;
            return os;
        }
    };
}

Note that I fixed the optionality of some parts to reflect real life.

You could use this to detect repeat-initialization of element/id fields.

Magic Sauce (see below)

#include "propagate.hpp"
DEF_PROPAGATOR(Selector, id, element, classes)

We'll dig into this later. Suffice it to say it generates the semantic actions that you had to tediously write.

Main dish

Now, we can simplify the parser rules a lot, and run the tests:

int main() {
    auto name        = as<std::string>[x3::alpha >> *x3::alnum];
    auto idRule      = "#" >> name;
    auto classesRule = +("." >> name);

    auto selectorRule
        = x3::rule<class TestClass, Ast::Selector>{"selectorRule"}
        = +( name        [ Selector.element ]
           | idRule      [ Selector.id ]
           | classesRule [ Selector.classes ]
           )
        ;

    for (std::string const& input : {
            "element#id.class1.class2.classn",
            "element#id.class1",
            ".class1#id.class2.class3",
            "#id.class1.class2",
            ".class1.class2#id",
        })
    {
        Ast::Selector sel;
        std::cout << std::quoted(input) << " -->\n";
        if (x3::parse(begin(input), end(input), selectorRule >> x3::eoi, sel)) {
            std::cout << "\tSuccess: " << sel << "\n";
        } else {
            std::cout << "\tFailed\n";
        }
    }
}

See it Live On Wandbox, printing:

"element#id.class1.class2.classn" -->
    Success: element#id.class1.class2.classn
"element#id.class1" -->
    Success: element#id.class1
".class1#id.class2.class3" -->
    Success: #id.class1.class2.class3
"#id.class1.class2" -->
    Success: #id.class1.class2
".class1.class2#id" -->
    Success: #id.class1.class2

The Magic

Now, how did I generate those actions? Using a little bit of Boost Preprocessor:

#define MEM_PROPAGATOR(_, T, member) \
    Propagators::Prop<decltype(std::mem_fn(&T::member))> member { std::mem_fn(&T::member) };

#define DEF_PROPAGATOR(type, ...) \
    struct type##S { \
        using T = Ast::type; \
        BOOST_PP_SEQ_FOR_EACH(MEM_PROPAGATOR, T, BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__)) \
    } static const type {};

Now, you might see that it defines static const variables named like the Ast types.

You're free to call this macro in another namespace, say namespace Actions { }

The real magic is Propagators::Prop<F> which has a bit of dispatch to allow for container attributes and members. Otherwise it just relays to x3::traits::move_to:

namespace Propagators {
    template <typename F>
    struct Prop {
        F f;
        template <typename Ctx>
        auto operator()(Ctx& ctx) const {
            return dispatch(x3::_attr(ctx), f(x3::_val(ctx)));
        }
      private:
        template <typename Attr, typename Dest>
        static inline void dispatch(Attr& attr, Dest& dest) {
            call(attr, dest, is_container(attr), is_container(dest));
        }

        template <typename T>
        static auto is_container(T const&)           { return x3::traits::is_container<T>{}; }
        static auto is_container(std::string const&) { return boost::mpl::false_{}; }

        // tags for dispatch
        using attr_is_container = boost::mpl::true_;
        using attr_is_scalar    = boost::mpl::false_;
        using dest_is_container = boost::mpl::true_;
        using dest_is_scalar    = boost::mpl::false_;

        template <typename Attr, typename Dest>
        static inline void call(Attr& attr, Dest& dest, attr_is_scalar, dest_is_scalar) {
            x3::traits::move_to(attr, dest);
        }
        template <typename Attr, typename Dest>
        static inline void call(Attr& attr, Dest& dest, attr_is_scalar, dest_is_container) {
            dest.insert(dest.end(), attr);
        }
        template <typename Attr, typename Dest>
        static inline void call(Attr& attr, Dest& dest, attr_is_container, dest_is_container) {
            dest.insert(dest.end(), attr.begin(), attr.end());
        }
    };
}

BONUS

A lot of the complexity in the propagator type is from handling container attributes. However, you don't actually need any of that:

auto name = as<std::string>[x3::alpha >> *x3::alnum];

auto selectorRule
    = x3::rule<class selector_, Ast::Selector>{"selectorRule"}
    = +( name        [ Selector.element ]
       | '#' >> name [ Selector.id ]
       | '.' >> name [ Selector.classes ]
       )
    ;

Is more than enough, and the propagation helper can be simplified to:

namespace Propagators {
    template <typename F> struct Prop {
        F f;
        template <typename Ctx>
        auto operator()(Ctx& ctx) const {
            return call(x3::_attr(ctx), f(x3::_val(ctx)));
        }
      private:
        template <typename Attr, typename Dest>
        static inline void call(Attr& attr, Dest& dest) {
            x3::traits::move_to(attr, dest);
        }
        template <typename Attr, typename Elem>
        static inline void call(Attr& attr, std::vector<Elem>& dest) {
            dest.insert(dest.end(), attr);
        }
    };
}

As you can see evaporating the tag dispatch has a beneficial effect.

See the simplified version Live On Wandbox again.

FULL LISTING

For posterity on this site:

  • test.cpp

    //#define BOOST_SPIRIT_X3_DEBUG
    #include <boost/spirit/home/x3.hpp>
    #include <iostream>
    #include <iomanip>
    
    namespace x3 = boost::spirit::x3;
    
    namespace Ast {
        using boost::optional;
    
        struct Selector {
            // These selectors always 
            //  - start with 1 or no elements, 
            //  - could contain 1 or no ids, and
            //  - could contain 0 to n classes.
            optional<std::string> element;
            optional<std::string> id;
            std::vector<std::string> classes;
    
            friend std::ostream& operator<<(std::ostream& os, Selector const&s) {
                if  (s.element.has_value()) os << s.element.value();
                if  (s.id.has_value())      os << "#" << s.id.value();
                for (auto& c : s.classes)   os << "." << c;
                return os;
            }
        };
    }
    
    #include "propagate.hpp"
    DEF_PROPAGATOR(Selector, id, element, classes)
    
    #include "as.hpp"
    int main() {
        auto name = as<std::string>[x3::alpha >> *x3::alnum];
    
        auto selectorRule
            = x3::rule<class selector_, Ast::Selector>{"selectorRule"}
            = +( name        [ Selector.element ]
               | '#' >> name [ Selector.id ]
               | '.' >> name [ Selector.classes ]
               )
            ;
    
        for (std::string const& input : {
                "element#id.class1.class2.classn",
                "element#id.class1",
                ".class1#id.class2.class3",
                "#id.class1.class2",
                ".class1.class2#id",
            })
        {
            Ast::Selector sel;
            std::cout << std::quoted(input) << " -->\n";
            if (x3::parse(begin(input), end(input), selectorRule >> x3::eoi, sel)) {
                std::cout << "\tSuccess: " << sel << "\n";
            } else {
                std::cout << "\tFailed\n";
            }
        }
    }
    
  • propagate.hpp

    #pragma once
    #include <boost/preprocessor/cat.hpp>
    #include <boost/preprocessor/seq/for_each.hpp>
    #include <functional>
    
    namespace Propagators {
        template <typename F> struct Prop {
            F f;
            template <typename Ctx>
            auto operator()(Ctx& ctx) const {
                return call(x3::_attr(ctx), f(x3::_val(ctx)));
            }
          private:
            template <typename Attr, typename Dest>
            static inline void call(Attr& attr, Dest& dest) {
                x3::traits::move_to(attr, dest);
            }
            template <typename Attr, typename Elem>
            static inline void call(Attr& attr, std::vector<Elem>& dest) {
                dest.insert(dest.end(), attr);
            }
        };
    }
    
    #define MEM_PROPAGATOR(_, T, member) \
        Propagators::Prop<decltype(std::mem_fn(&T::member))> member { std::mem_fn(&T::member) };
    
    #define DEF_PROPAGATOR(type, ...) \
        struct type##S { \
            using T = Ast::type; \
            BOOST_PP_SEQ_FOR_EACH(MEM_PROPAGATOR, T, BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__)) \
        } static const type {};
    
  • as.hpp

    #pragma once
    #include <boost/spirit/home/x3.hpp>
    
    namespace {
        template <typename T>
        struct as_type {
            template <typename...> struct tag{};
            template <typename P>
            auto operator[](P p) const {
                return boost::spirit::x3::rule<tag<T,P>, T> {"as"}
                       = p;
            }
        };
    
        template <typename T>
            static inline const as_type<T> as = {};
    }
    



回答2:


Maybe it is not, what you want to have, then please inform me and I will delete the answer, but for this somehow simple parsing, you do not need Boost and neither Spirit.

A simple regex will do to split of the given string into a token. We can observe the following:

  • An "element" name starts at the begin of the line and is a string of alpha numerical characters.
  • the "id" starts always with a hash #
  • and, the class names always start with a dot .

So, we can form a single regex to match those 3 types of tokens.

((^\w+)|[\.#]\w+)

You may look here for an explanation of the regex.

Then we can write a simple program that reads selectors, splits it into tokens and then assigns those to the Selector struct.

Please see the following example. This should give you an idea on how it could be done.

#include <iostream>
#include <vector>
#include <regex>
#include <sstream>
#include <string>
#include <iterator>
#include <cctype>

struct Selector {
    std::string element;
    std::string id;
    std::vector<std::string> classes;
};

std::stringstream inputFileStream{ R"(element1#id1.class11.class12.class13.class14
element2#id2.class21.class22
#id3.class31.class32.class33.class34.class35
.class41.class42,class43#id4
.class51#id5.class52.class53.class54.class55.class56
)"};

//std::regex re{R"(([\.#]?\w+))"};
std::regex re{ R"(((^\w+)|[\.#]\w+))" };

int main() {

    std::vector<Selector> selectors{};

    // Read all lines of the source file
    for (std::string line{}; std::getline(inputFileStream, line); ) {

        // Split the line with selector string into tokens
        std::vector<std::string> tokens(std::sregex_token_iterator(line.begin(), line.end(), re), {});

        // Here we will store the one single selector
        Selector tempSelector{};

        // Go though all tokens and check the type of them
        for (const std::string& token : tokens) {

            // Depending on the structure element type, add it to the correct structure element field
            if (token[0] == '#') tempSelector.id = std::move(token.substr(1));
            else if (token[0] == '.') tempSelector.classes.emplace_back(token.substr(1));
            else if (std::isalnum(token[0])) tempSelector.element = token;
            else std::cerr << "\n*** Error: Invalid token found: " << token << "\n";
        }
        // Add the new selector to the vector of selectors
        selectors.push_back(std::move(tempSelector));
    }


    // Show debug output
    for (const Selector& s : selectors) {
        std::cout << "\n\nSelector\n\tElement:\t" << s.element << "\n\tID:\t\t" << s.id << "\n\tClasses:\t";
        for (const std::string& c : s.classes)
            std::cout << c << " ";
    }
    std::cout << "\n\n";

    return 0;
}

Of course we could do a more sophisticated regex with some additional checking.



来源:https://stackoverflow.com/questions/61315488/parsing-selector-struct-with-alternating-tokens-using-boost-spirit-x3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!