CMakeList file to generate LLVM bitcode file from C source file

问题

I am trying to generate LLVM bytecode file from a C source file (hello.c) using CMake. And below is my CMakeLists file.

###### CMakelists.txt ############
cmake_minimum_required(VERSION 2.8.9)
set(CMAKE_C_COMPILER "clang")
set(CMAKE_C_FLAGS "-emit-llvm")

project (hello)
add_executable(hello hello.c)

I am new to CMake and not sure if this is the right way. I could not find any rules to make *.bc in the generated MakeFile . Please correct me here. I also tried "-save-temps"
Considering this for a single .c file. It would be really helpful if you could give me some hints on generating the same for a complete C project.

回答1:

I think what you ultimately want is to be able to build a C-program project with CMake and clang in which source files are compiled to LLVM bitcode and the executable is linked from the bitcode files.

With CMake, asking clang to to link bitcode files means asking it to link in LTO mode, with the -flto linkage option.

And you can get clang to compile to LLVM bitcode with the -flto compilation option, or with the -emit-llvm option.

For illustration here is a Hello World project comprising two source files and one header:

$ ls -R
.:
CMakeLists.txt  hello.c  hello.h  main.c

Here is the:

CMakeLists.txt

cmake_minimum_required(VERSION 3.0.2)
project (hello)
set(CMAKE_C_COMPILER clang)
set(CMAKE_EXE_LINKER_FLAGS ${CMAKE_EXE_LINKER_FLAGS} "-flto")
add_executable(hello main.c hello.c)
target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -flto)
#target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -emit-llvm)

It will work equally well with:

#target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -flto)
target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -emit-llvm)

Make a build directory for CMake and go there:

$ mkdir build
$ cd build

Generate the build system:

$ cmake ..

Build:

$ make
Scanning dependencies of target hello
[ 33%] Building C object CMakeFiles/hello.dir/main.c.o
[ 66%] Building C object CMakeFiles/hello.dir/hello.c.o
[100%] Linking C executable hello
[100%] Built target hello

You will not find any *.bc targets in the Makefiles, nor any *.bc files generated:

$ egrep -r '.*\.bc'; echo Done
Done
$ find -name '*.bc'; echo Done
Done

because the compilation option -flto or -emit-llvm results in an output file:

CMakeFiles/hello.dir/main.c.o
CMakeFiles/hello.dir/hello.c.o

that adheres to the usual CMake naming convention but is in fact not an object file but an LLVM bitcode file, as you see:

$ file $(find -name '*.o')
./CMakeFiles/hello.dir/hello.c.o: LLVM IR bitcode
./CMakeFiles/hello.dir/main.c.o:  LLVM IR bitcode

The program does the usual thing:

$ ./hello 
Hello World!

Later

When I try " make hello.o " it should generate the object file right? the cmd executes successfully but, could not find the generated object file. Am I doing it right?

You are doing it in one way that is right, though not the only way that is right, but your expectations are wrong. Look again at:

$ file $(find -name '*.o')
./CMakeFiles/hello.dir/hello.c.o: LLVM IR bitcode
./CMakeFiles/hello.dir/main.c.o:  LLVM IR bitcode

You can see there that the .o files that are made from hello.c and main.c by the CMake-generated makefile are not called hello.o and main.o but hello.c.o and main.c.o. CMake prefers a compiled filename to preserve the extension of the source file, and append .o. That is a fairly common practice. So if you wanted to use the makefile to compile hello.c, the most obviously right way would be make hello.c.o.

Let's see what actually happens. In my CMake build directory:

$ make VERBOSE=1 hello.c.o
make -f CMakeFiles/hello.dir/build.make CMakeFiles/hello.dir/hello.c.o
make[1]: Entering directory '/home/imk/develop/so/scrap/build'
make[1]: 'CMakeFiles/hello.dir/hello.c.o' is up to date.
make[1]: Leaving directory '/home/imk/develop/so/scrap/build'

There was nothing to be done, because my hello.c.o was up to date. So I'll delete it and repeat:

$ rm CMakeFiles/hello.dir/hello.c.o
$ make VERBOSE=1 hello.c.o
make -f CMakeFiles/hello.dir/build.make CMakeFiles/hello.dir/hello.c.o
make[1]: Entering directory '/home/imk/develop/so/scrap/build'
Building C object CMakeFiles/hello.dir/hello.c.o
clang   -flto -o CMakeFiles/hello.dir/hello.c.o   -c /home/imk/develop/so/scrap/hello.c
make[1]: Leaving directory '/home/imk/develop/so/scrap/build'

Now it has been recompiled.

However, because many people - like you - would expect hello.o to be compiled from hello.c, CMake helpfully defines hello.o as a .PHONY target that depends on hello.c.o:

$ egrep  -A3 'hello.o.*:.*hello.c.o' Makefile 
hello.o: hello.c.o

.PHONY : hello.o

So in fact I can do:

$ rm CMakeFiles/hello.dir/hello.c.o
$ make VERBOSE=1 hello.o
make -f CMakeFiles/hello.dir/build.make CMakeFiles/hello.dir/hello.c.o
make[1]: Entering directory '/home/imk/develop/so/scrap/build'
Building C object CMakeFiles/hello.dir/hello.c.o
clang   -flto -o CMakeFiles/hello.dir/hello.c.o   -c /home/imk/develop/so/scrap/hello.c
make[1]: Leaving directory '/home/imk/develop/so/scrap/build'

make hello.o is another way of making hello.c.o

回答2:

The problem is that using the -emit-llvm flag does not produce a final binary and stops the configuration tests that CMake performs once that flag is used in them.

Apart from what's already been written about using the LTO infrastructure, you have 3 (or 2 and a half) other alternatives.

One is to use Whole-Program LLVM and use the commands provided to extract the relevant bitcode parts.

The other is to go the manual way of setting up custom targets (see add_custom_target and add_custom_command) on your CMake binary targets, that will get triggered on changes and will reproduce the desired outcome as if executed manually on the command line each time.

Now, on this last point, I had a similar need so I created a CMake project that provides that functionality (llvm-ir-cmake-utils), but allows you to hook up those custom targets on existing ones as you please and see fit without having to rewrite everything from scratch each time.

There are examples in the repo, but in short, it allows you to attach custom targets on already existing CMake targets, e.g.

[...]
add_executable(qux ${SOURCES})

[...]
# this will create a bitcode generating target 
# and allow it to depend on the initial target in order to detect source code changes
llvmir_attach_bc_target(qux_bc qux)
add_dependencies(qux_bc qux)
[...]

回答3:

After make,

$>file CMakeFiles/hello.dir/hello.c.o
CMakeFiles/hello.dir/hello.c.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

if set(CMAKE_C_FLAGS "-emit-llvm")

written before

project (hello)

In order to obtain IR bitcode, I wrote:

###### CMakelists.txt ############
cmake_minimum_required(VERSION 2.8.9)
project (hello)
set(CMAKE_C_COMPILER "clang")
set(CMAKE_C_FLAGS "-flto")
set(CMAKE_EXE_LINKER_FLAGS ${CMAKE_EXE_LINKER_FLAGS} "-flto")
add_executable(hello hello.c)
target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -flto)

I worked several hours in order to have a Makefile working to compile from IR code to native using lld, then with cmake it was much more faster. Then reading at cmake generated Makefile, I was able to correct my Makefile: