In the C++ standard does well-formed means that the code compiles?

问题

The C++ standards defines well-formed programs as

C ++ program constructed according to the syntax rules, diagnosable semantic rules, and the one-definition rule

I am wondering if all well-formed program compile or not (if it is not the case, what types of error make the difference between a well-formed program and a compilable problem). For example would a program containing ambiguity errors considered as well-formed?

回答1:

A well-formed program can have undefined behaviour.

It's in a note, and thus not technically authoritative, but it seems that it is intention that termination of compilation (or "translation" as the standard calls it) is within the scope of possible UB:

[intro.defs]

undefined behavior

behavior for which this document imposes no requirements
[ Note: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data.

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.

Evaluation of a constant expression never exhibits behavior explicitly specified as undefined in [intro] through [cpp] of this document ([expr.const]). — end note ]

There are also practical implementation limits:

[implemits]

Because computers are finite, C++ implementations are inevitably limited in the size of the programs they can successfully process.
Every implementation shall document those limitations where known. This documentation may cite fixed limits where they exist, say how to compute variable limits as a function of available resources, or say that fixed limits do not exist or are unknown.

Furthermore, compilers can have, and do have bugs. Well-formed simply means the a standard conforming compiler should compile it (within the limitations mentioned above). A buggy compiler does not necessarily conform to the standard.

Lastly, the standard document itself is not perfect. If there is disagreement about what the rules mean, then it is possible for a program to be well-formed under one interpretation, and ill-formed under another interpretation.

If a compiler disagrees with the programmer or another compiler, then it might fail to compile a program that is believed to be well-formed by the other party.

回答2:

I am wondering if all well-formed programs compile or not

Of course not, in practice.

A typical example is when you ask for optimizations on a huge translation unit containing long C++ functions.

(but in theory, yes)

See of course the n3337 C++11 standard, or the C++17 standard.

This happened to me in the (old) GCC MELT project. I was generating C++ code compiled by GCC, basically using transpiler (or source to source compilation) techniques on Lispy DSL of my invention to generate the C++ code of GCC plugins. See also this and that.

In practice, if you generate a single C++ function of a hundred thousand statements, the compiler has trouble in optimizing it.

Large generated C++ functions are possible in GUI code generators (e.g. FLUID), or with some parser generators such as ANTLR (when the underlying input grammar is badly designed), interface generators such as SWIG, or by using preprocessors such as GPP or GNU m4 (like GNU autoconf does). C++ template expansion may also produce arbitrarily large functions (e.g. when you combine several C++ container templates and ask the GCC compiler to optimize at link-time with g++ -flto -O2)

I did benchmark, and experimentally observed in the previous decade that compiling a C++ function of n statements may take O(n²) time (and IIRC O(n log n) space) with g++ -O3. Notice that a good optimizing C++ compiler has to do register allocation, loop unrolling, inline expansion, that some ABIs (including on Linux/x86-64) mandate passing or returning small struct-s (or instances of small class-s) thru registers. All these optimizations require trade-offs and are hitting some combinatorial explosion wall: in practice, compiler optimization is at least an intractable problem, and probably an undecidable one. See also the related Rice's theorem and read the Dragon Book.

You could adapt my manydl.c program (generating more or less random C code compiled as several plugins then dlopen-ing them on Linux) to emit C++. You'll then be able to do some GCC compiler benchmarks, since that manydl program is able to generate hundred thousands plugins containing lots of more or less random C functions. See Drepper's paper how to write shared libraries and be aware of libgccjit.

See also this draft report (explaining more about g++ compilation) and the RefPerSys project (generating C++ code). Read the blog of the late Jacques Pitrat (1934-oct.2019) for an example of a C program generating the half millions lines of its own C code, whose design is explained in this paper and that book.

Read Thriving in a crowded and changing world: C++ 2006--2020

来源：https://stackoverflow.com/questions/62409721/in-the-c-standard-does-well-formed-means-that-the-code-compiles

标签

c++

language-lawyer

standards

well-formed