问题
I have some old software (in a language that's not dead but is dead to me ;-)) that implements a basic pattern-matching and -rewriting system for source code. I am considering resurrecting this code, translating it into a modern language, and open-sourcing the project as a refactoring power-tool. Before I go much further, I want to know if anything like this exists already (my google-fu is fanning air on this tonight).
Here's how it works:
- the pattern-matching part matches source-code patterns spanning multiple lines of code using a template with binding variables,
- the pattern-rewriting part uses a template to rewrite the matched code, inserting the contents of the bound variables from the matching template
- matching and rewriting templates are associated (1:1) by a simple (unconditional) rewrite rule
the software operates on the abstract syntax tree (AST) of the input application, and outputs a modified AST which can then be regenerated into new source code
for example, suppose we find a bunch of while-loops that really should be for-loops. The following template will match the while-loop pattern:
Template oldLoopPtrn
int @cnt@ = 0;
while (@cnt@ < @max@)
{
… @body@
++@cnt@;
}
End_Template
while the following template will specify the output rewrite pattern:
Template newLoopPtrn
for(int @cnt@ = 0; @cnt@ < @max@; @cnt@++)
{
@body@
}
End_Template
and a simple rule to associate them
Rule oldLoopPtrn --> newLoopPtrn
so code that looks like this
int i=0;
while(i<arrlen)
{
printf("element %d: %f\n",i,arr[i]);
++i;
}
gets automatically rewritten to look like this
for(int i = 0; i < arrlen; i++)
{
printf("element %d: %f\n",i,arr[i]);
}
The closest thing I've seen like this is some of the code-refactoring tools, but they seem to be geared towards interactive rewriting of selected snippets, not wholesale automated changes.
I believe that this kind of tool could supercharge refactoring, and would work on multiple languages (even HTML/CSS). I also believe that converting and polishing the code base would be a huge project that I simply cannot do alone in any reasonable amount of time.
So, anything like this out there already? If not, any obvious features (besides rewrite-rule conditions) to consider?
EDIT: The one feature of this system that I like very much is that the template patterns are fairly obvious and easy to read because they're written in the same language as the target source code, not in some esoteric mutated regex/BNF format.
回答1:
I am considering resurrecting this code, translating it into a modern language, and open-sourcing the project as a refactoring power-tool.
I think it would be simply great to have such a tool freely available.
But there is a commercial product available: DMS Software Reengineering Toolkit.
The DMS Software Reengineering Toolkit is a set of tools for automating customized source program analysis, modification or translation or generation of software systems, containing arbitrary mixtures of languages ("domains"). The term "software" for DMS is very broad and covers any formal notation, including programming languages, markup languages, hardware description languages, design notations, data descriptions, etc. This toolkit is the first step towards the implementation of The Design Maintenance System™, an ambitious vision of a 21st Century software engineering environment that supports the incremental construction and maintenance of large application systems, driven by semantics and captured designs.
Also, for C source code there is the coccinelle tool:
Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. Beyond collateral evolutions, Coccinelle is successfully used (by us and others) for finding and fixing bugs in systems code.
回答2:
TXL is rule based rather than pattern based, so it has more capabilities but perhaps a steeper learning curve.
回答3:
A Coccinelle implementation of the above rule would be:
@@
identifier cnt;
expression max,E;
@@
cnt = 0
... when != cnt=E
-while (cnt < max)
+for (cnt=0; cnt < max; cnt++)
{
...
- cnt++;
}
For the C code:
int main () {
int i=0;
printf ("hello\n");
while(i < arrlen) {
printf("element %d: %f\n",i,arr[i]);
++i;
}
}
it gives
int main () { int i=0; printf ("hello\n"); for (i = 0; i < arrlen; i++) { printf("element %d: %f\n",i,arr[i]); } }
The ... when != cnt=E allows arbitrary code between the initialization of cnt and the while loop, but checks that cnt is not redefined. A more elaborate rule could also get rid of the initialization of cnt, if it is not used between the initialization and the while loop.
回答4:
Our DMS has already been mentioned. It has transformation rules that are, as the OP put it, "easy to read because they're written in the same language as the target source code".
Here's a link that shows a complete, detailed example of pattern-matching/transformation using DMS.
回答5:
In TXL, it looks like this:
include "c.grm"
rule main
replace [declaration_or_statement*]
int cnt [id] = 0;
while (cnt < max [shift_expression] )
{
body [declaration_or_statement*]
}
deconstruct * body
++cnt;
by
for (int cnt = 0; cnt < max; cnt++)
{
body
}
end rule
For this input:
int main () {
int i=0;
while(i < arrlen) {
printf("element %d: %f\n",i,arr[i]);
++i;
}
}
It yields this:
int main () {
for (int i = 0; i < arrlen; i++) {
printf ("element %d: %f\n", i, arr [i]);
++i;
}
}
For the split case shown in the Coccinelle example above, you add a similar guard to the rule.
来源:https://stackoverflow.com/questions/415455/any-software-for-pattern-matching-and-rewriting-source-code