问题
I am trying to write a regular expression to find C style headers in Java source files. At the present time I am experimenting with this with Python.
Here is my source code:
import re
text = """/*
* Copyright blah blah blha blah
* blah blah blah blah
* 2008 blah blah blah @ org
*/"""
print
print "I guess the program printed the correct thing."
pattern = re.compile("^/.+/$")
print "-----------"
print pattern
pos = 0
while True:
match = pattern.search(text, pos)
if not match:
break
s = match.start()
e = match.end()
print ' %2d : %2d = "%s"' % (s, e-1, text[s:e])
pos = e
I am trying to write a simple expression that just looks for anything between a forward slash and another forward slash. I can make the regular expression more complicated later.
Does anyone know where I am going wrong? I am using a forward slash the dot meta-character, the plus symbol for 1 or more things, and the dollar symbol for the end.
回答1:
I don't think you should anchor (using '^' and '$') the match.
Secondly, I think the regex should be r"/[^/]*/" which matches a (portion of) a string that starts with a slash, followed by zero or more non-slash characters and then terminates with a slash.
To wit:
>>> import re
>>> text = """foo bar baz
... /*
... * Copyright blah blah blha blah
... * blah blah blah blah
... * 2008 blah blah blah @ org
... */"""
>>> rx = re.compile(r"/[^/]*/", re.DOTALL)
>>> mo = rx.search(text)
>>> text[mo.start(): mo.end()]
'/*\n * Copyright blah blah blha blah \n * blah blah blah blah \n * 2008 blah blah blah @ org\n */'
Note that the comment does not start a the start of the string but that the regex finds it nicely.
回答2:
For starters, you need to specify the DOTALL flag because by default, the . character does not match newlines.
Try:
pattern = re.compile("^/.+/$", re.DOTALL)
来源:https://stackoverflow.com/questions/32916322/regular-expression-to-find-c-style-comments