Regular Expression To Find C Style Comments

邮差的信 提交于 2021-01-29 03:53:48

问题


I am trying to write a regular expression to find C style headers in Java source files. At the present time I am experimenting with this with Python.

Here is my source code:

import re

text = """/*
       * Copyright blah blah blha blah 
       * blah blah blah blah 
       * 2008 blah blah blah @ org
       */"""

print
print "I guess the program printed the correct thing."

pattern = re.compile("^/.+/$")

print "-----------"
print pattern 

pos = 0
while True:
    match = pattern.search(text, pos)
    if not match:
        break
    s = match.start()
    e = match.end()
    print '   %2d : %2d = "%s"' % (s, e-1, text[s:e])
    pos = e 

I am trying to write a simple expression that just looks for anything between a forward slash and another forward slash. I can make the regular expression more complicated later.

Does anyone know where I am going wrong? I am using a forward slash the dot meta-character, the plus symbol for 1 or more things, and the dollar symbol for the end.


回答1:


I don't think you should anchor (using '^' and '$') the match.

Secondly, I think the regex should be r"/[^/]*/" which matches a (portion of) a string that starts with a slash, followed by zero or more non-slash characters and then terminates with a slash.

To wit:

>>> import re                                                                                                                           
>>> text = """foo bar baz                                                                                                     
... /*                                                                                  
...        * Copyright blah blah blha blah                                                                                              
...        * blah blah blah blah                                                                                                        
...        * 2008 blah blah blah @ org                                                                                                  
...        */"""                                                                                                                          
>>> rx = re.compile(r"/[^/]*/", re.DOTALL)                                                                                              
>>> mo = rx.search(text)                                                                                                                
>>> text[mo.start(): mo.end()]                                                                                                          
'/*\n       * Copyright blah blah blha blah \n       * blah blah blah blah \n       * 2008 blah blah blah @ org\n       */'

Note that the comment does not start a the start of the string but that the regex finds it nicely.




回答2:


For starters, you need to specify the DOTALL flag because by default, the . character does not match newlines.

Try:

pattern = re.compile("^/.+/$", re.DOTALL)


来源:https://stackoverflow.com/questions/32916322/regular-expression-to-find-c-style-comments

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!