Automatic code deduplication of assembly language?

I've been going through some Assembly Programming Videos to get a better understanding of how to manually optimize the *.s files left after compiling with gcc/g++ -S ... One of the topics covered was Refactoring Redundant Code that demonstrates how to move redundant code to its own labeled block ending with a ret and replacing it with a call.

The example given in the video is 2 blocks containing:

mov eax,power
mul ebx
mov power,eax
inc count

which it replaces with call CalculateNextPower and CalculateNextPower looks like:

CalculateNextPower:
mov eax,power
mul ebx
mov power,eax
inc count
ret

Out of curiosity, trying to reduce compiled size, I compiled some C and C++ projects with -S and various optimizations including -Os,-O2,-O3,-pipe,-combine and -fwhole-program and analyzed the resulting *.s files for redundancy using a lightly patched (for .s files) version of duplo. Only -fwhole-program (now deprecated IIRC) had a significant effect toward eliminating duplicate code across files (I assume its replacement(s) -flto would behave similarly during link time - roughly equivalent to compiling with -ffunction-sections -fdata-sections and linking with --gc-sections) but still misses significantly large blocks of code.

Manual optimization using the duplo output resulted in ~10% size reduction in a random C project and and almost 30% in a random C++ project when only contiguous blocks of assembly with at least 5 contiguous duplicate instructions were deduplicated.

Am I missing a compiler option (or even a standalone tool) that eliminates redundant assembly automatically when compiling for size (including other compilers: clang, icc, etc..) or is this functionality absent (for a reason?)?

If it doesn't exist, it could be possible to modify duplo to ignore lines starting with a '.' or ';' (and others?) and replace duplicate code blocks with calls to functions with the duplicate code, but I am open to other suggestions that would work directly with the compiler's internal representations (preferably clang or gcc).

Edit: I patched duplo to identify blocks of duplicate assembly here, but it still requires manual refactoring at the moment. As long as the same compiler is used to generate the code, it may be possible (but probably slow) to identify the largest blocks of duplicate code, put them in their own "function" block and replace the code with a CALL to that block.

What you want is a clone detector tool.

These exist in a variety of implementations that vary depending on the granularity of elements of the document being processed, and how much structure is available.

Those that match raw line (won't work for you, you want to parameterize your subroutines by differing constants [both data and index offset] and or named locations or other named suboutines). The token based detectors might work, in that they will identify single-point places (e.g., constants or identifiers) that vary. But what you really want is a structural matcher, that can pick out variant addressing modes or even variants in code in the middle of the block (See AST based clone detectors, which I happen to build).

To detect with structure, you have to have structure. Fortunately, even assembly language code has structure in the form of a grammar, and blocks of code delimited by subroutine entries and exits (these latter are bit more problematic to detect in assembly, because there may be more than one of each).

When you detect using structures, you have at least the potential to use the structure to modify the code. But if you have the program source represented as a tree, you have structure (subtrees and sequences of subtrees) over which to detect clones, and one can abstract clone matches by modifying the trees at the match points. (Early versions of my clone detector for COBOL abstracted clones into COPY libs. We stopped doing that mostly because you don't want to abstract every clone that way).

What you are proposing is called procedural abstraction and has been implemented by more than one group as research projects. Here is one. Here's another. And another.

Clone detection is normally used in the context of source code, though its function is similar. Since procedural abstraction occurs at a lower level, it can accomplish more. For example, suppose there are two calls to different functions, but with exactly the same complicated argument computations. A procedural abstractor can pull the argument calculation into a procedure, but a clone detector would have a hard time doing so.

I don't believe either gcc or llvm currently has a supported implementation of PA. I searched both sets of documents and didn't find one. In at least two cases above, the optimizer is running on assembly code produced by gcc rather than as a gcc internal optimization. This probably explains why these techniques were not built into the compiler. You might try the authors to see where their implementations are.

来源：https://stackoverflow.com/questions/19663795/automatic-code-deduplication-of-assembly-language

标签

assembly

size

compiler-optimization

code-duplication