GCC compilation very slow (large file)

不羁的心 提交于 2019-12-03 12:38:27

Upon testing, I found that the Clang compiler seems to have less problems compiling large files. Although Clang consumed almost a gigabyte of memory during compilation, it successfully turned OP's source code form into a 70 kB object file. This works for all optimization levels I tested.

gcc was also able to compile this file quickly and without consuming too much memory if optimization is turned on. This bug in gcc comes from the large expression in OPs code which places a huge burden on the register allocator. With optimizations turned on, the compiler performs an optimization called common subexpression elimination which is able to remove a lot of redundancy from OPs code, reducing both compilation time and object file size to manageable values.

Here are some tests with the testcase from the aforementioned bug report:

$ time gcc5 -O3 -c -o testcase.gcc5-O3.o testcase.c
real    0m39,30s
user    0m37,85s
sys     0m1,42s
$ time gcc5 -O0 -c -o testcase.gcc5-O0.o testcase.c
real    23m33,34s
user    23m27,07s
sys     0m5,92s
$ time tcc -c -o testcase.tcc.o testcase.c
real    0m2,60s
user    0m2,42s
sys     0m0,17s
$ time clang -O3 -c -o testcase.clang-O3.o testcase.c
real    0m13,71s
user    0m12,55s
sys     0m1,16s
$ time clang -O0 -c -o testcase.clang-O0.o testcase.c
real    0m17,63s
user    0m16,14s
sys     0m1,49s
$ time clang -Os -c -o testcase.clang-Os.o testcase.c
real    0m14,88s
user    0m13,73s
sys 0m1,11s
$ time clang -Oz -c -o testcase.clang-Oz.o testcase.c
real    0m13,56s
user    0m12,45s
sys     0m1,09

This is the resulting object file size:

    text       data     bss      dec        hex filename
39101286          0       0 39101286    254a366 testcase.clang-O0.o
   72161          0       0    72161      119e1 testcase.clang-O3.o
   72087          0       0    72087      11997 testcase.clang-Os.o
   72087          0       0    72087      11997 testcase.clang-Oz.o
38683240          0       0 38683240    24e4268 testcase.gcc5-O0.o
   87500          0       0    87500      155cc testcase.gcc5-O3.o
   78239          0       0    78239      1319f testcase.gcc5-Os.o
69210504    3170616       0 72381120    45072c0 testcase.tcc.o

Nearly the entire file is one expression, the assignment to double f[24] = .... That's going to generate a gigantic abstract syntax tree tree. I'd be surprised if anything but a specialized compiler could handle that efficiently.

The 20 megabyte file itself may be fine, but the one giant expression may be what is causing the issue. Try as a preliminary step, splitting the line into double f[24] = {0} and then 24 assignments of f[0] = ...; f[1] = ... and see what happens. Worst case, you can split the 24 assignments into 24 functions, each in its own .c file, and compile them separately. This won't reduce the size of the AST, it will just reorganize it, but GCC is probably more optimized at handling many statements which together add up to a lot of code, vs. one huge expression.

The ultimate approach would be to generate the code in a more optimized manner. If I search for s4*s5*s6, for example, I get 77,783 hits. These s[4-6] variables don't change. You should generate a temporary variable, double _tmp1 = s4*s5*s6; and then use that instead of repeating the expression. You've just eliminated 311,132 nodes from your abstract syntax tree (assuming s4*s5*s6 is 5 nodes and _tmp1 is one node). That's that much less processing GCC has to do. This should also generate faster code (you won't repeat the same multiplication 77,783 times).

If you do this in a smart way in a recursive manner (e.g. s4*s5*s6 --> _tmp1, (c4*c6+s4*s5*s6) --> (c4*c6+_tmp1) --> _tmp2, c5*s6*(c4*c6+s4*s5*s6) --> c5*s6*_tmp2 -> _tmp3, etc...), you can probably eliminate most of the size of the generated code.

chqrlie

Try Fabrice Bellard's tiny C compiler tcc from http://tinycc.org:

chqrlie$ time tcc -c test4.c

real    0m1.336s
user    0m1.248s
sys     0m0.084s

chqrlie$ size test4.o
   text    data     bss     dec     hex filename
38953877        3170632       0 42124509        282c4dd test4.o

Yes, that's 1.336 seconds on a pretty basic PC!

Of course I cannot test the resulting executable, but the object file should be linkable with the rest of your program and libraries.

For this test, I used a dummy version of file mex.h:

typedef struct mxArray mxArray;
double *mxGetPr(const mxArray*);
enum { mxREAL = 0 };
mxArray *mxCreateDoubleMatrix(int nx, int ny, int type);

gcc still has not finished compiling...

EDIT: gcc managed to hog my Linux box so badly I cannot connect to it anymore:(

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!