How to find all possible regex matches in python?

后端 未结 2 647
醉梦人生
醉梦人生 2020-12-05 11:47

I am trying to find all possible word/tag pairs or other nested combinations with python and its regular expressions.

sent = \'(NP (NNP Hoi) (NN Hallo) (NN H         


        
2条回答
  •  天命终不由人
    2020-12-05 12:30

    Regular expressions used in modern languages DO NOT represent regular languages. zmo is right in saying that regular languages in Language Theroy are represented by finite state automata but the regular expressions that use any sort of backtracking like those with capturing groups, lookarounds and etc that are used in modern languages CANNOT be represented by FSAs known in Language Theory. How can you represent a pattern like (\w+)\1 with a DFA or even and NFA?

    The regular expression you are looking for can be something like this(only matches to two levels):

    (?=(\((?:[^\)\(]*\([^\)]*\)|[^\)\(])*?\)))
    

    I tested this on http://regexhero.net/tester/

    The matches are in the captured groups:

    1: (NP (NNP Hoi) (NN Hallo) (NN Hey) (NNP (NN Ciao) (NN Adios))

    1: (NNP Hoi)

    1: (NN Hallo)

    1: (NN Hey)

    1: (NNP (NN Ciao) (NN Adios))

    1: (NN Ciao)

    1: (NN Adios)

提交回复
热议问题