How can I strip multiline C comments from a file using Perl?

前端 未结 6 1152
情话喂你
情话喂你 2020-12-03 04:02

Can anyone get me with the regular expression to strip multiline comments and single line comments in a file?

eg:

                  \" WHOLE         


        
6条回答
  •  醉梦人生
    2020-12-03 04:37

    From perlfaq6 "How do I use a regular expression to strip C style comments from a file?":


    While this actually can be done, it's much harder than you'd think. For example, this one-liner

    perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
    

    will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, created by Jeffrey Friedl and later modified by Fred Curtis.

    $/ = undef;
    $_ = <>;
    s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
    print;
    

    This could, of course, be more legibly written with the /x modifier, adding whitespace and comments. Here it is expanded, courtesy of Fred Curtis.

    s{
       /\*         ##  Start of /* ... */ comment
       [^*]*\*+    ##  Non-* followed by 1-or-more *'s
       (
         [^/*][^*]*\*+
       )*          ##  0-or-more things which don't start with /
                   ##    but do end with '*'
       /           ##  End of /* ... */ comment
    
     |         ##     OR  various things which aren't comments:
    
       (
         "           ##  Start of " ... " string
         (
           \\.           ##  Escaped char
         |               ##    OR
           [^"\\]        ##  Non "\
         )*
         "           ##  End of " ... " string
    
       |         ##     OR
    
         '           ##  Start of ' ... ' string
         (
           \\.           ##  Escaped char
         |               ##    OR
           [^'\\]        ##  Non '\
         )*
         '           ##  End of ' ... ' string
    
       |         ##     OR
    
         .           ##  Anything other char
         [^/"'\\]*   ##  Chars which doesn't start a comment, string or escape
       )
     }{defined $2 ? $2 : ""}gxse;
    

    A slight modification also removes C++ comments, possibly spanning multiple lines using a continuation character:

     s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
    

提交回复
热议问题