How many GCC optimization levels are there?

限于喜欢 提交于 2019-11-27 02:57:01

To be pedantic, there are 8 different valid -O options you can give to gcc, though there are some that mean the same thing.

The original version of this answer stated there were 7 options. GCC has since added -Og to bring the total to 8

From the man page:

  • -O (Same as -O1)
  • -O0 (do no optimization, the default if no optimization level is specified)
  • -O1 (optimize minimally)
  • -O2 (optimize more)
  • -O3 (optimize even more)
  • -Ofast (optimize very aggressively to the point of breaking standard compliance)
  • -Og (Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.)
  • -Os (Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. -Os disables the following optimization flags: -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version)

There may also be platform specific optimizations, as @pauldoo notes, OS X has -Oz

Let's interpret the source code of GCC 5.1 to see what happens on -O100 since it is not clear on the man page.

We shall conclude that:

  • anything above -O3 up to INT_MAX is the same as -O3, but that could easily change in the future, so don't rely on it.
  • GCC 5.1 runs undefined behavior if you enter integers larger than INT_MAX.
  • the argument can only have digits, or it fails gracefully. In particular, this excludes negative integers like -O-1

Focus on subprograms

First remember that GCC is just a front-end for cpp, as, cc1, collect2. A quick ./XXX --help says that only collect2 and cc1 take -O, so let's focus on them.

And:

gcc -v -O100 main.c |& grep 100

gives:

COLLECT_GCC_OPTIONS='-O100' '-v' '-mtune=generic' '-march=x86-64'
/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.1.0/cc1 [[noise]] hello_world.c -O100 -o /tmp/ccetECB5.

so -O was forwarded to both cc1 and collect2.

O in common.opt

common.opt is a GCC specific CLI option description format described in the internals documentation and translated to C by opth-gen.awk and optc-gen.awk.

It contains the following interesting lines:

O
Common JoinedOrMissing Optimization
-O<number>  Set optimization level to <number>

Os
Common Optimization
Optimize for space rather than speed

Ofast
Common Optimization
Optimize for speed disregarding exact standards compliance

Og
Common Optimization
Optimize for debugging experience rather than speed or size

which specify all the O options. Note how -O<n> is in a separate family from the other Os, Ofast and Og.

When we build, this generates a options.h file that contains:

OPT_O = 139,                               /* -O */
OPT_Ofast = 140,                           /* -Ofast */
OPT_Og = 141,                              /* -Og */
OPT_Os = 142,                              /* -Os */

As a bonus, while we are grepping for \bO\n inside common.opt we notice the lines:

-optimize
Common Alias(O)

which teaches us that --optimize (double dash because it starts with a dash -optimize on the .opt file) is an undocumented alias for -O which can be used as --optimize=3!

Where OPT_O is used

Now we grep:

git grep -E '\bOPT_O\b'

which points us to two files:

Let's first track down opts.c

opts.c:default_options_optimization

All opts.c usages happen inside: default_options_optimization.

We grep backtrack to see who calls this function, and we see that the only code path is:

  • main.c:main
  • toplev.c:toplev::main
  • opts-global.c:decode_opts
  • opts.c:default_options_optimization

and main.c is the entry point of cc1. Good!

The first part of this function:

  • does integral_argument which calls atoi on the string corresponding to OPT_O to parse the input argument
  • stores the value inside opts->x_optimize where opts is a struct gcc_opts.

struct gcc_opts

After grepping in vain, we notice that this struct is also generated at options.h:

struct gcc_options {
    int x_optimize;
    [...]
}

where x_optimize comes from the lines:

Variable
int optimize

present in common.opt, and that options.c:

struct gcc_options global_options;

so we guess that this is what contains the entire configuration global state, and int x_optimize is the optimization value.

255 is an internal maximum

in opts.c:integral_argument, atoi is applied to the input argument, so INT_MAX is an upper bound. And if you put anything larger, it seem that GCC runs C undefined behaviour. Ouch?

integral_argument also thinly wraps atoi and rejects the argument if any character is not a digit. So negative values fail gracefully.

Back to opts.c:default_options_optimization, we see the line:

if ((unsigned int) opts->x_optimize > 255)
  opts->x_optimize = 255;

so that the optimization level is truncated to 255. While reading opth-gen.awk I had come across:

# All of the optimization switches gathered together so they can be saved and restored.
# This will allow attribute((cold)) to turn on space optimization.

and on the generated options.h:

struct GTY(()) cl_optimization
{
  unsigned char x_optimize;

which explains why the truncation: the options must also be forwarded to cl_optimization, which uses a char to save space. So 255 is an internal maximum actually.

opts.c:maybe_default_options

Back to opts.c:default_options_optimization, we come across maybe_default_options which sounds interesting. We enter it, and then maybe_default_option where we reach a big switch:

switch (default_opt->levels)
  {

  [...]

  case OPT_LEVELS_1_PLUS:
    enabled = (level >= 1);
    break;

  [...]

  case OPT_LEVELS_3_PLUS:
    enabled = (level >= 3);
    break;

There are no >= 4 checks, which indicates that 3 is the largest possible.

Then we search for the definition of OPT_LEVELS_3_PLUS in common-target.h:

enum opt_levels
{
  OPT_LEVELS_NONE, /* No levels (mark end of array).  */
  OPT_LEVELS_ALL, /* All levels (used by targets to disable options
                     enabled in target-independent code).  */
  OPT_LEVELS_0_ONLY, /* -O0 only.  */
  OPT_LEVELS_1_PLUS, /* -O1 and above, including -Os and -Og.  */
  OPT_LEVELS_1_PLUS_SPEED_ONLY, /* -O1 and above, but not -Os or -Og.  */
  OPT_LEVELS_1_PLUS_NOT_DEBUG, /* -O1 and above, but not -Og.  */
  OPT_LEVELS_2_PLUS, /* -O2 and above, including -Os.  */
  OPT_LEVELS_2_PLUS_SPEED_ONLY, /* -O2 and above, but not -Os or -Og.  */
  OPT_LEVELS_3_PLUS, /* -O3 and above.  */
  OPT_LEVELS_3_PLUS_AND_SIZE, /* -O3 and above and -Os.  */
  OPT_LEVELS_SIZE, /* -Os only.  */
  OPT_LEVELS_FAST /* -Ofast only.  */
};

Ha! This is a strong indicator that there are only 3 levels.

opts.c:default_options_table

opt_levels is so interesting, that we grep OPT_LEVELS_3_PLUS, and come across opts.c:default_options_table:

static const struct default_options default_options_table[] = {
    /* -O1 optimizations.  */
    { OPT_LEVELS_1_PLUS, OPT_fdefer_pop, NULL, 1 },
    [...]

    /* -O3 optimizations.  */
    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
    [...]
}

so this is where the -On to specific optimization mapping mentioned in the docs is encoded. Nice!

Assure that there are no more uses for x_optimize

The main usage of x_optimize was to set other specific optimization options like -fdefer_pop as documented on the man page. Are there any more?

We grep, and find a few more. The number is small, and upon manual inspection we see that every usage only does at most a x_optimize >= 3, so our conclusion holds.

lto-wrapper.c

Now we go for the second occurrence of OPT_O, which was in lto-wrapper.c.

LTO means Link Time Optimization, which as the name suggests is going to need an -O option, and will be linked to collec2 (which is basically a linker).

In fact, the first line of lto-wrapper.c says:

/* Wrapper to call lto.  Used by collect2 and the linker plugin.

In this file, the OPT_O occurrences seems to only normalize the value of O to pass it forward, so we should be fine.

Demi

Seven distinct levels:

  • -O0 (default): No optimization.

  • -O or -O1 (same thing): Optimize, but do not spend too much time.

  • -O2: Optimize more aggressively

  • -O3: Optimize most aggressively

  • -Ofast: Equivalent to -O3 -ffast-math. -ffast-math triggers non-standards-compliant floating point optimizations. This allows the compiler to pretend that floating point numbers are infinitely precise, and that algebra on them follows the standard rules of real number algebra. It also tells the compiler to tell the hardware to flush denormals to zero and treat denormals as zero, at least on some processors, including x86 and x86-64. Denormals trigger a slow path on many FPUs, and so treating them as zero (which does not trigger the slow path) can be a big performance win.

  • -Os: Optimize for code size. This can actually improve speed in some cases, due to better I-cache behavior.

  • -Og: Optimize, but do not interfere with debugging. This enables non-embarrassing performance for debug builds and is intended to replace -O0 for debug builds.

There are also other options that are not enabled by any of these, and must be enabled separately. It is also possible to use an optimization option, but disable specific flags enabled by this optimization.

For more information, see GCC website.

Four (0-3): See the GCC 4.4.2 manual. Anything higher is just -O3, but at some point you will overflow the variable size limit.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!