Removing useless lines from c++ file

问题

There are many times when as I am debugging, or reusing some code, the file starts to acquire lines that don't do anything, though they may have done something at one point.

Things like vectors and getting filled, and then go unused, classes/structs that are defined but never used, and functions that are declared, but never used.

I understand that in many cases, some of these things are not superfluous, as they might be visible from other files, but in my case, there are no other files, just extraneous code in my file.

While I understand that technically speaking, invoking push_back does something, and therefore the vector is not unused per se, in my case, its result goes unused.

So: Is there a way to do this, either using a compiler (clang, gcc, VS, etc) or an external tool?

Example:

#include<vector>
using namespace std;
void test() {
    vector<int> a;
    a.push_back(1);
}
int main() {
    test();
    return 0;
}

Should become: int main(){return 0};

回答1:

Our DMS Software Reengineering Toolkit with its C++11 front end could be used to do this; it presently does not do this off the shelf. DMS is designed to provide custom tool construction for arbitrary source languages, and contains full parsers, name resolvers, and various flow analyzers to support analysis, as well as the ability to apply source-to-source transformations on the code based on analysis results.

In general, you want a static analysis that determines whether every computation (result, there may be several, consider just "x++") is used or not. For each unused computation, in effect you want to remove the unused computation, and repeat the analysis. For efficiency reasons, you want to do an analysis that determines all the (points of) usage of the result(s) just once; this is essentially a data flow analysis. When the usage set of a computation result goes empty, that computation result can be deleted (note that deleting "x++" value result may leave behind "x++" because the increment is still needed!) and the usage sets of computations on which it depends can be adjusted to remove references from the deleted one, possibly causing more removals.

To do this analysis for any language, you have to be able to trace results. For C (and C++) this can be pretty ugly; there are "obvious" uses where a computation result is used in a expression, and where it is assigned to a local/global variable (which is used somewhere else), and there are indirect assignments through pointers, object field updates, through arbitrary casts, etc. To know these effects, your dead code analysis tool has to be able to read the entire software system, and compute dataflows across it.

To be safe, you want that analysis to be conservative, e.g., if the tool does not have proof that a result is not used, then it must assume the result is used; you often have to do this with pointers (or array indexes which are just pointers in disguise) because in general you can't determine precisely where a pointer "points". One can obviously build a "safe" tool by assuming all results are used :-} You will also end up with sometimes very conservative but necessary assumptions for library routines for which you don't have the source. In this case, it is helpful to have a set of precomputed summaries of the library side effects (e.g., "strcmp" has none, "sprintf" overwrites a specific operand, "push_back" modifies its object...). Since libraries can be pretty big, this list can be pretty big.

DMS in general can parse and entire source code base, build symbol tables (so it knows which identifiers are local/global and their precise type), do control and local dataflow analysis, build a local "sideeffects" summary per function, build a call graph and global side effects, and do a global points-to analysis, providing this "computation used" information with appropriate conservatism.

DMS has been used to do this computation on C code systems of 26 million lines of code (and yes, that's a really big computation; it takes 100Gb VM to run). We did not implement the dead code elimination part (the project had another purpose) but that is straightforward once you have this data. DMS has done the dead code elimination on large Java codes with a more conservative analysis (e.g., "no use mentions of an identifier" which means assignments to the identifier are dead) which causes a surprising amount of code removal in many real codes.

DMS's C++ parser presently builds symbol tables and can do control flow analysis for C++98 with C++11 being close at hand. We still need local data flow analysis, which is some effort, but the global analyses already pre-exist in DMS and are available to be used for this effect. (The "no uses of an identifier" is easily available from the symbol table data, if you don't mind a more conservative analysis).

In practice, you don't want the tool to just silently rip things out; some might actually be computations you wish to preserve anyway. What the Java tool does is produce two results: a list of dead computations which you can inspect to decide if you believe it, and a dead-code-removed version of the source code. If you believe the dead code report, you keep the dead-code-removed version; if you see a "dead" computation you think shouldn't be dead, you modify the code to make it not dead and run the tool again. With a big code base, inspecting the dead code report itself can be trying; how do "you" know if some apparantly dead code isn't valued by "somebody else" on your team?. (Version control can be used to recover if you goof!)

A really tricky issue we do not (and no tool I know of) handle, is "dead code" in the presence of conditional compilation. (Java does not have this problem; C has it in spades, C++ systems much less). This can be truly nasty. Imagine a conditional in which arm has certain side effects and the other arm has different side effects, or another case in which one are is interpreted by GCC's C++ compiler, and the other arm interpreted by MS, and the compilers disagree on what the constructs do (yes, the C++ compilers do disagree in dark corners). At best we can be very conservative here.

CLANG has some ability to do flow analysis; and some ability to do source transformations, so it might be coerced into doing this. I don't know if it can do any global flow/points-to analysis. It seems to have a bias towards single compilation units since its principal use is compiling a single compilation unit.

回答2:

To catch unused variables, you can enable the -Wunused flag on the gcc compiler. This will warn you about unused parameters, variables and computed values at compile time. I have found that using the -Wall -Wextra and -Werror flags ensure the compiler catches some of the issues like you describe. More info can be found here: http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

As for finding unused classes, one option is to use an IDE, say like Eclipse, and use the 'Find References' feature to search for places where that class/object may be used.

回答3:

Short answer is "no." It is not possible to tell by static analysis of the client code that vector's push_back method doesn't have any important side-effects. For all the analysis tool knows, it writes to a database somewhere and drives a stock trade.

回答4:

I'd recommend using versioning software - SVN, Git, Mercurial, Perforce, ... - so that after debugging you can use said versioning tool to find and remove debugging leftovers. This makes it very easy to keep your code more lean.

Also, this kind of test code typically has little test coverage so if you do have unit testing, they should show as not covered code.

Then there are tools that explicitly look for this kind of stuff - Lint, Coverity and so on. Most are commercial though. Also try using -O3 on GCC, the compiler may recognize more actually unused variables that way as it'll more aggressively inline and eliminate code.

来源：https://stackoverflow.com/questions/15825188/removing-useless-lines-from-c-file

标签

c++

static-analysis

unused-variables