Here\'s the deal. Is there a way to have strings tokenized in a line based on multiple regexes?
One example:
I have to get all href tags, their corresponding
Have you looked at PyParsing?
From their homepage:
Here is a program to parse "Hello, World!" (or any greeting of the form ", !"):
from pyparsing import Word, alphas
greet = Word( alphas ) + "," + Word( alphas ) + "!" # <-- grammar defined here
hello = "Hello, World!"
print hello, "->", greet.parseString( hello )
The program outputs the following:
Hello, World! -> ['Hello', ',', 'World', '!']