formal method to do semantic analysis in compiler

I know there is a formalism called attribute grammar,and a non-formalism method called syntax-directed translation,but the first is inefficient and the latter one is difficult to automate.

Does there exist other recent formalism about semantic analysis?

OP suggests "attribute grammars" are inefficient, and syntax-directed translation is difficult to automate. I offer a proof-point showing otherwise, name a few other semantic systems, and suggest how they might be integrated, below.

Our DMS Software Reengineering Toolkit supports both of these activities and more.

It provides parsers for full context free grammars, and the ability to define, compile, and executed in parallel attribute grammars with arbitrary data and operations, and arbitrary flow across the syntax nodes. One can compute metrics, build symbol tables, or compute a semantic analysis with such attribute grammars.

Given a (DMS) grammar rule:

  LHS = RHS1 ... RHSN ;

one writes a DMS attribute grammar rule for a named attribute grammar computation Pass1 (for practical reasons, there can be many different passes, some even building one another's results) in the form:

  <<Pass1>>:  {  LHS.propertyI=fn1(RHSx.propertyY,...);
                 ...
                 RHSa.propertyB=fn2(RHSp.propertyQ,...);
                 ...
              }

for a set of (arbitrary type) properties associated with each grammar element, either on the left or right hand side of the grammar rule, using arbitrary functions fnI defined over the types involved, implemented in the DMS's underlying (parallel) language, PARLANSE. DMS computes the dataflows across the the set of rules, and determines a partial order (parallel) computation that achieves the computation, and compiles this into PARLANSE code for execution. The result of an attribute computation is a tree decorated with the computed properties.

With care, once should be able to define a denotational semantics of a language computed by an attribute grammar. One of the key notions in DS is that of an "environment", which maps identifiers to types and possibly symbolic values. (The former is traditionally called a symbol table). At AST nodes that introduce new scopes, one would write an attribute function that created an new environment by combining the parent environment with newly introduced identifiers, and pass that down from the AST node to its children, e.g., for the rule

exp = 'let' ID '=' exp1 'in' exp2;

one might code an attribute grammar rule:

<<Denotation>>: {
     exp2.env = augment_environment(exp.env,
                                    new_variable_and_value_pair(name(ID.),
                                                                exp1.value));
     exp.value=exp2.value;
               }

I'm not sure what the OP means by (attribute grammars are) "inefficient". We've used DMS attribute grammars to compute semantic properties (name and type resolution) for all of C++14. While such a definition is huge by most academic paper standards, it is that way because C++14 itself is huge and an astonishing mess ("camel by committee"). In spite of this, our attribute grammar seems to run well enough. More importantly, it is powerful enough for a very small team to build it (in contrast to the scale of "team" supporting Clang).

DMS also provides the ability to encode source-to-source transformations ("rewrites") using the surface syntax of the source and target (if different than source) languages, of the form, "if you see this, replace it by that". These rewrites are applied to the parse trees to provide revised trees; a prettyprinter ("anti-parser") provided by DMS can then regenerate source code for the target language. If one limits oneself to rewrites that exactly tile the original AST one gets "syntax-directed translation". OP might claim this (syntax directed translation) is difficult to automate; I'd agree but the work is done and available. OP does have to decide what rules she wants to define and execute.

DMS rewrite rules take the form:

 rule rule_name(parameter1:syntax_category1, ... parameterN...)
   :  source_syntax_category -> target_syntax_category
   "  <text in source language>  "
  ->
   "  <text in target language> "
  if  condition_of_matched_source_pattern;

where the parameters are placeholders for syntax-typed subtrees, the rule maps a tree of type source_syntax_category -> target_syntax_category (often the same one), and the "..." are meta-quotes wrapped around surface syntax with "\"-labelled embedded escapes for the parameters where needed. The meta-quoted code fragments are interpreted as specifications for trees (using the same parsing engine that reads the source code); this is not a string-match. An example:

  rule simplify_if_then_else(c:condition,t:then_clause,e:else_clause)
     statement->statement
  =  " if \c then \t else \e "
  -> " \t "
  if c == "true";

A generalization of the (above purely syntactic) check) which is more "semantic" would be

  ...
  if can_determine_is_true(c);

which assumes custom predicate that consults other DMS-derivable results to decide the instantiated condition is always true at the point where it is found (the matched tree c carries its source position with it, so the context is implied). One might build control and data flow for the desired language, and use the resulting dataflow to determine values that arrive at the condition c, which may then always turn out to be "true" in a nontrivial way.

I have assumed a DMS-defined support predicate "can_determine_if_true". This just a bit of custom PARLANSE code.

However, since the rewrites transform one tree into another tree, one can apply an arbitrarily long/complex set of transformation rules repeatedly to the entire tree. This gives DMS rewrites the power of a Post (string [generalized to tree]) rewriting system, thus Turing capable. You can technically produce any arbitrary transformation of the original tree with sufficient transforms. Usually one uses other features of DMS to make writing the transforms a bit easier; for instance, a rewrite rule may consult the result of a particular attribute grammar computation in order to "easily" use information from "far away in the tree" (individual rewrites rules always have a fixed, maximum "radius").

DMS provides a lot of additional support machinery, to help one construct control flow graphs and/or compute dataflow with efficient parallel solvers. DMS also has a wide variety of available front ends for various langauges such as C, C++14, Java1.8, IBM Enterprise COBOL, ... available so that a tool engineer can concentrate on building the tool she wants, rather than fighting to build a parser from scratch (only to discover that one must live Life After Parsing).

If OP is interested in an recent overview of another style of (structured operational) semantics, he might consult course notes for Semantics of Programming Languages. We claim the techniques in such papers can be implemented on top of DMS if one likes.

One can make a long list of various academic tools that implement (some) of these ideas. Most of them are research tools and not mature. One such research system, JastAdd is an attribute grammar evaluation system, and I hear that it stands out in capability and performance, but I have no specific experience with it.

来源：https://stackoverflow.com/questions/27958278/formal-method-to-do-semantic-analysis-in-compiler

标签

compiler-construction

semantics

context-sensitive-grammar