问题
I want to have a converter between this
#include <ololo.h>
#ifdef HAVE_QQQ
#include <qqq.h>
#endif
char* ololize(char* s) {
#ifdef HAVE_QQQ
return qqq(s);
#else
return ololo(s);
#endif
}
and something like this
(include_angular "ololo.h")
(p_ifdef "HAVE_QQQ"
(include_angular "qqq.h"))
(define_function "ololize" [(ptr char) "s"] (ptr char)
(p_ifdef "HAVE_QQQ"
(return (qqq s))
:else
(return (ololo s)))))
I.e. representation of a source code as a easily manageable tree, not from compiler's point of view, but from programmer's point of view.
I don't expect 100% correct work, but it should work for most existing source files. Bonus points if I can "round-trip" the code to tree and back.
Are there any existing tools or libraries for that?
回答1:
Our DMS Software Reengineering Toolkit and its C++ front end can do this. DMS provides language-precise parsing (including handling GCC and MS dialects as well as C++11), and builds ASTs. Depending on how it is configured, it can also build full symbol tables, and presently can do control flow analysis for C++ (but not quite yet for C++11).
From the internal AST, DMS can regenerate legal source that will produce the same compiled result, either prettyprinted or preserving space ("fidelity mode") almost exactly. We can also ask the AST be exported as XML.
For OP's small program, here is the AST rendered as XML directly from our GCC4 dialect parser (there's a "PrintASTasXML" function in the DMS libraries). Note the AST contains the INCLUDE and preprocessor conditionals.
<?xml version="1.1" encoding="UTF-8"?>
<!-- Using DMS PrintASTasXML (v.1.00) -->
<!-- XML generated on 2013/04/13 15:24:44 -->
<DMSForest>
<tree node="translation_unit" type="2" domain="1" id="1iity" parents="0" line="1" column="1" file="1">
<tree node="declaration_seq" type="994" domain="1" id="1iitt" line="1" column="1" file="1">
<tree node="declaration_seq" type="994" domain="1" id="1iepb" line="1" column="1" file="1">
<tree node="control_line" type="2133" domain="1" id="1ieos" line="1" column="1" file="1">
<tree node="'#'" type="2908" domain="1" id="1ieoi" literal="0" line="1" column="1" file="1"/>
<tree node="'include'" type="2759" domain="1" id="1ieok" literal="0" line="1" column="2" file="1"/>
<tree node="ANGLED_HEADER_NAME" type="2951" domain="1" id="1ieom" line="1" column="10" file="1">
<literal>ololo.h</literal>
</tree>
<tree node="new_line" type="2946" domain="1" id="1ieoo" literal="0" line="1" column="19" file="1"/>
</tree>
<tree node="pp_declaration_seq" type="997" domain="1" id="1ieph" line="2" column="1" file="1">
<tree node="if_directive" type="2113" domain="1" id="1iep4" line="2" column="1" file="1">
<tree node="'#'" type="2908" domain="1" id="1iep1" literal="0" line="2" column="1" file="1"/>
<tree node="'ifdef'" type="2756" domain="1" id="1ieov" literal="0" line="2" column="2" file="1"/>
<tree node="IDENTIFIER" type="2646" domain="1" id="1ieol" line="2" column="8" file="1">
<literal>HAVE_QQQ</literal>
</tree>
<tree node="new_line" type="2946" domain="1" id="1ieoz" literal="0" line="2" column="16" file="1"/>
</tree>
<tree node="control_line" type="2133" domain="1" id="1iepi" line="3" column="3" file="1">
<tree node="'#'" type="2908" domain="1" id="1iep6" literal="0" line="3" column="3" file="1"/>
<tree node="'include'" type="2759" domain="1" id="1iep8" literal="0" line="3" column="4" file="1"/>
<tree node="ANGLED_HEADER_NAME" type="2951" domain="1" id="1iepa" line="3" column="12" file="1">
<literal>qqq.h</literal>
</tree>
<tree node="new_line" type="2946" domain="1" id="1iepe" literal="0" line="3" column="19" file="1"/>
</tree>
<tree node="endif_directive" type="2117" domain="1" id="1ieoy" line="4" column="1" file="1">
<tree node="'#'" type="2908" domain="1" id="1iepl" literal="0" line="4" column="1" file="1"/>
<tree node="'endif'" type="2743" domain="1" id="1iepk" literal="0" line="4" column="2" file="1"/>
<tree node="new_line" type="2946" domain="1" id="1iepn" literal="0" line="4" column="7" file="1"/>
</tree>
</tree>
</tree>
<tree node="function_definition" type="1616" domain="1" id="1iito" line="6" column="1" file="1">
<tree node="function_head" type="1628" domain="1" id="1iiow" line="6" column="1" file="1">
<tree node="simple_type_specifier" type="1104" domain="1" id="1iep9" line="6" column="1" file="1">
<tree node="'char'" type="2723" domain="1" id="1iepd" literal="0" line="6" column="1" file="1"/>
</tree>
<tree node="ptr_declarator" type="1398" domain="1" id="1iio3" line="6" column="5" file="1">
<tree node="ptr_operator" type="1436" domain="1" id="1iepq" line="6" column="5" file="1">
<tree node="'*'" type="2903" domain="1" id="1iep7" literal="0" line="6" column="5" file="1"/>
</tree>
<tree node="noptr_declarator" type="1402" domain="1" id="1iioc" line="6" column="7" file="1">
<tree node="IDENTIFIER" type="2646" domain="1" id="1iepm" line="6" column="7" file="1">
<literal>ololize</literal>
</tree>
<tree node="'('" type="2887" domain="1" id="1iepr" literal="0" line="6" column="14" file="1"/>
<tree node="parameter_declaration" type="1591" domain="1" id="1iion" line="6" column="15" file="1">
<tree node="simple_type_specifier" type="1104" domain="1" id="1iioe" line="6" column="15" file="1">
<tree node="'char'" type="2723" domain="1" id="1iio0" literal="0" line="6" column="15" file="1"/>
</tree>
<tree node="ptr_declarator" type="1398" domain="1" id="1iiom" line="6" column="19" file="1">
<tree node="ptr_operator" type="1436" domain="1" id="1iiof" line="6" column="19" file="1">
<tree node="'*'" type="2903" domain="1" id="1iio1" literal="0" line="6" column="19" file="1"/>
</tree>
<tree node="IDENTIFIER" type="2646" domain="1" id="1iio8" line="6" column="21" file="1">
<literal>s</literal>
</tree>
</tree>
</tree>
<tree node="')'" type="2888" domain="1" id="1iiol" literal="0" line="6" column="22" file="1"/>
<tree node="function_qualifiers" type="1418" domain="1" id="1iio2" line="6" column="24" file="1"/>
</tree>
</tree>
</tree>
<tree node="compound_statement" type="873" domain="1" id="1iitn" line="6" column="24" file="1">
<tree node="'{'" type="2940" domain="1" id="1iiov" literal="0" line="6" column="24" file="1"/>
<tree node="statement" type="853" domain="1" id="1iitw" line="7" column="4" file="1">
<tree node="if_directive" type="2113" domain="1" id="1iipc" line="7" column="4" file="1">
<tree node="'#'" type="2908" domain="1" id="1iip4" literal="0" line="7" column="4" file="1"/>
<tree node="'ifdef'" type="2756" domain="1" id="1iip2" literal="0" line="7" column="5" file="1"/>
<tree node="IDENTIFIER" type="2646" domain="1" id="1iip5" line="7" column="11" file="1">
<literal>HAVE_QQQ</literal>
</tree>
<tree node="new_line" type="2946" domain="1" id="1iip0" literal="0" line="7" column="19" file="1"/>
</tree>
<tree node="jump_statement" type="984" domain="1" id="1iisg" line="8" column="7" file="1">
<tree node="'return'" type="2780" domain="1" id="1iiox" literal="0" line="8" column="7" file="1"/>
<tree node="$NONTERMINALAMBIGUITY" type="2999" nonterminalname="postfix_expression" nonterminaltype="402" domain="1" id="1iiou" children="2" line="8" column="14" file="1">
<tree node="postfix_expression" type="380" domain="1" id="1iipi" line="8" column="14" file="1">
<tree node="IDENTIFIER" type="2646" domain="1" id="1iip8" parents="2" line="8" column="14" file="1">
<literal>qqq</literal>
</tree>
<tree node="'('" type="2887" domain="1" id="1iip6" parents="2" literal="0" line="8" column="17" file="1"/>
<tree node="IDENTIFIER" type="2646" domain="1" id="1iipb" parents="2" line="8" column="18" file="1">
<literal>s</literal>
</tree>
<tree node="')'" type="2888" domain="1" id="1iip1" parents="2" literal="0" line="8" column="19" file="1"/>
</tree>
<tree node="postfix_expression" type="368" domain="1" id="1iipk" line="8" column="14" file="1">
<tree node="IDENTIFIER" type="2646" domain="1" id="1iip8" parents="2" alreadyprinted="true"/>
<tree node="'('" type="2887" domain="1" id="1iip6" parents="2" alreadyprinted="true"/>
<tree node="IDENTIFIER" type="2646" domain="1" id="1iipb" parents="2" alreadyprinted="true"/>
<tree node="')'" type="2888" domain="1" id="1iip1" parents="2" alreadyprinted="true"/>
</tree>
</tree>
<tree node="';'" type="2939" domain="1" id="1iisb" literal="0" line="8" column="20" file="1"/>
</tree>
<tree node="else_directive" type="2116" domain="1" id="1iisi" line="9" column="4" file="1">
<tree node="'#'" type="2908" domain="1" id="1iism" literal="0" line="9" column="4" file="1"/>
<tree node="'else'" type="2742" domain="1" id="1iisp" literal="0" line="9" column="5" file="1"/>
<tree node="new_line" type="2946" domain="1" id="1iiso" literal="0" line="9" column="9" file="1"/>
</tree>
<tree node="jump_statement" type="984" domain="1" id="1iit5" line="10" column="7" file="1">
<tree node="'return'" type="2780" domain="1" id="1iish" literal="0" line="10" column="7" file="1"/>
<tree node="$NONTERMINALAMBIGUITY" type="2999" nonterminalname="postfix_expression" nonterminaltype="402" domain="1" id="1iio5" children="2" line="10" column="14" file="1">
<tree node="postfix_expression" type="380" domain="1" id="1iit6" line="10" column="14" file="1">
<tree node="IDENTIFIER" type="2646" domain="1" id="1iisk" parents="2" line="10" column="14" file="1">
<literal>ololo</literal>
</tree>
<tree node="'('" type="2887" domain="1" id="1iisu" parents="2" literal="0" line="10" column="19" file="1"/>
<tree node="IDENTIFIER" type="2646" domain="1" id="1iisv" parents="2" line="10" column="20" file="1">
<literal>s</literal>
</tree>
<tree node="')'" type="2888" domain="1" id="1iit2" parents="2" literal="0" line="10" column="21" file="1"/>
</tree>
<tree node="postfix_expression" type="368" domain="1" id="1iiti" line="10" column="14" file="1">
<tree node="IDENTIFIER" type="2646" domain="1" id="1iisk" parents="2" alreadyprinted="true"/>
<tree node="'('" type="2887" domain="1" id="1iisu" parents="2" alreadyprinted="true"/>
<tree node="IDENTIFIER" type="2646" domain="1" id="1iisv" parents="2" alreadyprinted="true"/>
<tree node="')'" type="2888" domain="1" id="1iit2" parents="2" alreadyprinted="true"/>
</tree>
</tree>
<tree node="';'" type="2939" domain="1" id="1iit4" literal="0" line="10" column="22" file="1"/>
</tree>
<tree node="endif_directive" type="2117" domain="1" id="1iitp" line="11" column="4" file="1">
<tree node="'#'" type="2908" domain="1" id="1iits" literal="0" line="11" column="4" file="1"/>
<tree node="'endif'" type="2743" domain="1" id="1iitr" literal="0" line="11" column="5" file="1"/>
<tree node="new_line" type="2946" domain="1" id="1iitq" literal="0" line="11" column="10" file="1"/>
</tree>
</tree>
<tree node="'}'" type="2941" domain="1" id="1iitm" literal="0" line="12" column="1" file="1"/>
</tree>
</tree>
</tree>
</tree>
<FileIndex>
<File index="1">C:/temp/small.cpp</File>
</FileIndex>
<DomainIndex>
<Domain index="1">Cpp~GCC4</Domain>
</DomainIndex>
</DMSForest>
It won't quite round-trip from the XML; there's no XML reader precofigured to build an AST. However, DMS is highly customizable and has an XML parser as option; it would be straightforward to read an XML tree, regenerate the C++ AST, and then invoke the prettyprinter.
I'm not quite sure what you mean "manageable from a programmer's point of view". This is a precise tree. If it contains too much detail, you are welcome to apply XSLT transforms as you see fit to simplify it, but you will likely lose semantic accuracy doing so. And you will likely lose the ability to round-trip, too.
We don't see much need for such XML exports; the DMS ecosystem by design provides a huge amount of infrastructure for analyzing/transforming programs (including C++ programs); we've done massive C++ source code parsing/transformation with DMS. So the need to do the XML export to do something useful isn't very high. We offer it anyway, because people always ask for it. To our surprise, we have some clients that actually use it.
来源:https://stackoverflow.com/questions/15970103/how-to-convert-unpreprocessed-c-c-source-to-syntax-tree-and-back