Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm looking for a way to search for a given term in a project's C/C++ code, while ignoring any occurrences in comments and strings.

As the code base is rather large, i am searching for a way to automatically identify the lines of code matching my search term, as they need manual inspection.

If possible I'd like to perform the search on my linux system.

background

the code base in question is a realtime signal processing engine with a large number of 3rd party plugins. plugins are implemented in a variety of languages (mostly C, but also C++ and others; currently I only care for those two), no standards have been enforced.

our code base currently uses the built-in type float for floating-point numbers and we would like to replace that with a typedef that would allow us to use doubles. we would like to find all occurrences of float in the actual code (ignoring legit uses in comments and printouts).

What complicates things furthermore, is that there are some (albeit few) legit uses of float in the code payload (so we are really looking for a way to identify all places that require manual inspection, rather than run some automatic search-and-replace.)

the code also contains C-style static casts to (float), so relying on compiler warnings to identify type mismatches is often not an option.

the code base consists of more than 3000 (C and C++) files accumulating about 750000 lines of code.

the code is cross-platform (linux, osx, w32 being the main targets; but also freebsd and similar), and is compiled with the various native compilers (gcc/g++, clang/clang++, VisualStudio,...).

so far...

so far I'm using something ugly like:

 grep "float" | sed -e 's|//.*||' -e 's|"[^"]*"||g' | grep "float"

but I'm thinking that there must be some better way to search only payload code.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
218 views
Welcome To Ask or Share your Answers For Others

1 Answer

IMHO there is a good answers on a similar question at "Unix & Linux":

grep works on pure text and does not know anything about the underlying syntax of your C program. Therefore, in order not search inside comments you have several options:

  1. Strip C-comments before the search, you can do this using gcc -fpreprocessed -dD -E yourfile.c For details, please see Remove comments from C/C++ code

  2. Write/use some hacky half-working scripts like you have already found (e.g. they work by skipping lines starting with // or /*) in order to handle the details of all possible C/C++ comments (again, see the previous link for some scary testcases). Then you still may have false positives, but you do not have to preprocess anything.

  3. Use more advanced tools for doing "semantic search" in the code. I have found "coccigrep": http://home.regit.org/software/coccigrep/ This kind of tools allows search for some specific language statements (i.e. an update of a structure with given name) and certainly they drop the comments.

https://unix.stackexchange.com/a/33136/158220

Although it doesn't completely cover your "not in strings" requirement.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...