regular expression on stream instead of string?

后端 未结 3 531
深忆病人
深忆病人 2020-12-19 08:25

Suppose you want to do regular expression search and extract over a pipe, but the pattern may cross multiple lines, How to do it? Maybe a regular expression library work for

3条回答
  •  执笔经年
    2020-12-19 09:02

    I solved a similar problem for searching a stream using classic pattern matching. You may want to subclass the Matcher class of my solution streamsearch-py and perform regex matching in the buffer. Check out the included kmp_example.py below for a template. If it turns out classic Knuth-Morris-Pratt matching is all you need, then your problem would be solved right now with this little open source library :-)

    #!/usr/bin/env python
    
    # Copyright 2014-2015 @gitagon. For alternative licenses contact the author.
    # 
    # This file is part of streamsearch-py.
    # streamsearch-py is free software: you can redistribute it and/or modify
    # it under the terms of the GNU Affero General Public License as published by
    # the Free Software Foundation, either version 3 of the License, or
    # (at your option) any later version.
    # 
    # streamsearch-py is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU Affero General Public License for more details.
    # You should have received a copy of the GNU Affero General Public License
    # along with streamsearch-py.  If not, see .
    
    
    from streamsearch.matcher_kmp import MatcherKMP
    from streamsearch.buffer_reader import BufferReader
    
    class StringReader():
        """for providing an example read() from string required by BufferReader"""
        def __init__(self, string):
            self.s = string
            self.i = 0
    
        def read(self, buf, cnt):
            if self.i >= len(self.s): return -1
            r = self.s[self.i]
            buf[0] = r
            result = 1
            print "read @%s" % self.i, chr(r), "->", result
            self.i+=1
            return result
    
    def main():
    
        w = bytearray("abbab")
        print "pattern of length %i:" % len(w), w
        s = bytearray("aabbaabbabababbbc")
        print "text:", s
        m = MatcherKMP(w)
        r = StringReader(s)
        b = BufferReader(r.read, 200)
        m.find(b)
        print "found:%s, pos=%s " % (m.found(), m.get_index())
    
    
    if __name__ == '__main__':
        main()
    

    output is

    pattern of length 5: abbab
    text: aabbaabbabababbbc
    read @0 a -> 1
    read @1 a -> 1
    read @2 b -> 1
    read @3 b -> 1
    read @4 a -> 1
    read @5 a -> 1
    read @6 b -> 1
    read @7 b -> 1
    read @8 a -> 1
    read @9 b -> 1
    found:True, pos=5 
    

提交回复
热议问题