I\'m looking for a way to search for a given term in a project\'s C/C++ code, while ignoring any occurrences in comments and strings.
As the code base is rather larg
The robust way to do this should be with cscope
(http://cscope.sourceforge.net/) in line-oriented mode using the find this C symbol
option but I haven't used that on a variety of C standards so if that doesn't work for you or if you can't get cscope
then do this:
find . -type f -print |
while IFS= read -r file
do
sed 's/a/aA/g; s/__/aB/g; s/#/aC/g' "$file" |
gcc -P -E - |
sed 's/aC/#/g; s/aB/__/g; s/aA/a/g' |
awk -v file="$file" -v OFS=': ' '/\/{print file, $0}'
done
The first sed
replaces all hash (#
) and __
symbols with unique identifier strings, so that the preprocessor doesn't do any expansion of #include, etc. but we can restore them after preprocessing.
The gcc
preprocesses the input to strip out comments.
The second sed
replaces the hash-identifier string that we previously added with an actual hash sign.
The awk
actually searches for float
within word-boundaries and if found prints the file name plus the line it was found on. This uses GNU awk for word-boundaries \<
and \>
.
The 2nd sed's job COULD be done as part of the awk command but I like the symmetry of the 2 seds.
Unlike if you use cscope
, this sed/gcc/sed/awk approach will NOT avoid finding false matches within strings but hopefully there's very few of those and you can weed them out while post-processing manually anyway.
It will not work for file names that contain newlines - if you have those you can but the body in a script and execute it as find .. -print0 | xargs -0 script
.
Modify the gcc command line by adding whatever C or C++ version you are using, e.g. -ansi
.