Clang 3.0 is able to compile OpenCL to ptx and use Nvidia\'s tool to launch the ptx code on GPU. How can I do this? Please be specific.
With the current version of of llvm(3.4), libclc and nvptx back-end, the compilation process has changed slightly.
You have to explicitly tell the nvptx backend which driver interface to use; your options are nvptx-nvidia-cuda or nvptx-nvidia-nvcl (for OpenCL) and their 64 bit equivalents nvptx64-nvidia-cuda or nvptx64-nvidia-nvcl.
The generated .ptx code differs slightly according to the chosen interface. In the assembly code produced for the CUDA driver API, intrinsics .global and .ptr are dropped from entry functions but they are required by OpenCL. I've modified Mikael's compile steps slightly to produce code that can be run with OpenCL host:
Compile to LLVM IR:
clang -Dcl_clang_storage_class_specifiers -isystem libclc/generic/include -include clc/clc.h -target nvptx64-nvidia-nvcl -xcl test.cl -emit-llvm -S -o test.ll
Link kernel:
llvm-link libclc/built_libs/nvptx64--nvidiacl.bc test.ll -o test.linked.bc
Compile to Ptx:
clang -target nvptx64-nvidia-nvcl test.linked.bc -S -o test.nvptx.s