Grammatical inference of regular expressions for given finite list of representative strings?

前端 未结 2 1450
一生所求
一生所求 2020-12-03 01:30

I\'m working on analyzing a large public dataset with lots of verbose human-readable strings that were clearly generated by some regular (in the formal language theory sense

2条回答
  •  盖世英雄少女心
    2020-12-03 01:57

    Yes, it turns out this does exist; what is required is what is known academically as a DFA Learning algorithm, examples of which include:

    • Angluin's L*
    • L* (adding counter-examples to columns)
    • Kearns / Vazirani
    • Rivest / Schapire
    • NL*
    • Regular positive negative inference (RPNI)
    • DeLeTe2
    • Biermann & Feldman's algorithm
    • Biermann & Feldman's algorithm (using SAT-solving)

    Source for the above is libalf, an open-source automata learning algorithm framework in C++; descriptions of at least some of these algorithms can be found in this textbook, among others. There are also implementations of grammatical inference algorithms (including DFA learning) in gitoolbox for MATLAB.

    Since this question has come up before and has not been satisfactorily answered in the past, I am in the process of evaluating these algorithms and will update will more information about how useful they are, unless someone with more expertise in the area does first (which is preferable).

    NOTE: I am accepting my own answer for now but will gladly accept a better one if someone can provide one.

    FURTHER NOTE: I've decided to go with the route of using custom code, since using a generic algorithm turns out to be a bit overkill for the data I'm working with. I'm leaving this answer here in case someone else needs it, and will update if I ever do evaluate these.

提交回复
热议问题