OpenCL crashes on call to clGetPlatformIDs

亡梦爱人 提交于 2019-12-04 07:11:43

There are three total ways for a program to use external library:

  • Static linkage: Directly insert the library into your executable. The external library, presented as a .lib file, contains nothing but packaged .obj files. Your program invokes functions from the library as normal. The compiler extracts executable code from the lib, inserts it, and performs full, complete linkage against it. It is as if you compiled against the imported functions like they were from your own source code.
  • Load-time dynamic linkage, aka 'implicit linking': Load the library when you launch the program. The external library, presented as a .dll containing executable code, and a .lib file containing the exports from the .dll, is tentatively linked against by the compiler and linker. The linker uses the .lib to understand how to call the .dll at run-time, and to put in deferred bindings into your program. When the OS launches your program, it performs 'load-time' linking - it looks up all of the deferred bindings, attempts to find a .dll file, finishes the linkage of the deferred bindings in your program, and allows you to run the file.
  • "Pure" run-time dynamic linkage, aka 'explicit linking': Directly calling LoadLibrary. Your program has no specific references to any .lib, .dll, or otherwise. Your program starts running, itself calls LoadLibrary with a string path to a .dll. LoadLibrary merges the .dll into your virtual memory, and then your program calls GetProcAddress to get a function pointer to the function you want to call. You then use that function pointer to make calls.

You can't normally link against a dll without the .lib. The compiler wants to resolve those function call references to real addresses, but we don't want to put in real addresses since we want DLLs to be loaded into any arbitrary memory address (DLLs are 'relocatable').

From my understanding, a .lib used as an import library contains stubs that the main program links directly against - so all calls in the program go through the stubs. The stubs then have references to an 'Import Address Table". When the OS loads a DLL into memory for a process, it does so by filling out the IAT. The stub then just calls the DLL by making an indirect jump that references the right slot in the IAT.

So if a DLL MathLib has an exported function Factorial that my exe is importing, then the import .lib file has an actual function Factorial that my exe statically compiles against. That Factorial in that .lib looks like the following psuedo code:

int Factorial( int value ) { 
   // Read MathLib's IAT which should always be at address 0x8ba100.
   // Factorial's real address gets stored in slot 2, so add 8 to the address
   // to read from.
   __asm jmp *0x8ba108; // nb this is an indirect jump.
}

And then we hope that when the OS loads that DLL, that IAT is filled out correctly, else we jump into nothingness.

So I think what happened is that you were compiling against one .lib, but 'load-time' linking against the wrong opencl.dll. The IAT was never created, or was created in the wrong place, and so you jumped into nothingness; that's why this line created a segfault:

0x0000000000402cc0 <+0>: jmpq *0x4b74e8(%rip) # 0x8ba1ae

So lets figure out why we linked wrong. There could be 3 sets of opencl.dll/opencl.lib files on your computer:

  • The opencl.lib/dll that comes from Kronos, and is actually just a stub/loader library that figures out what real providers are on your computer and does dispatches function calls to the actual right lib.
  • The opencl.lib/dll that comes from Intel from their SDK and drivers.
  • The opencl.lib/dll that comes from Nvidia from their drivers.

Which of these files did you actually have? My estimate is thus:

  • The opencl.dll that came from kronos got installed into c:\windows\system32.
  • There is no opencl.lib from Kronos
  • There was probably no opencl.lib from nvidia, since you didn't have their SDK installed.
  • You probably had an opencl.lib and opencl.dll from Intel since you did have their SDK installed.

You were definitely linking against the Intel opencl.lib, but appeared to be loading the Kronos opencl.dll in c:\windows\system32. One solution would be to get the program to load the Intel opencl.dll when you run the program by putting their dll in your program's directory.

However, you state that you were able to make things work using this compilation line:

g++ -I. s.cpp -L. -lOpenCL

There's something neat about gcc on Windows - in order to link against a library, you don't need to have the .lib. Gcc figures it out for you by inspecting the dll; other people have figured out how to do the same when someone gives them a dll but no lib. In most other compilers, especially Visual Studio, you need to have a .lib and a .dll to link against something. That's why the Win SDK installs hundreds of .lib (kernel32.lib, eg). Turns out that the compiler can actually infer it if it wanted to, but libs exist as an archaic mechanism.

Anyway, you ran that above gcc link line, it found a suitable opencl.dll using the search path, invented its own .lib for it, and compiled against it; you launched your program, it used that same search path to get an opencl.dll, it was the same one you compiled against, so your program runs. Whew.

I still have some suggestions:

  • Find an opencl.lib and opencl.dll pair that come from Kronos's "Installable Client Driver" ICD Loader. That loader will then figure out how to bind to a particular provider (nvidia, intel, etc) at runtime.
  • Distribute the Kronos opencl.dll with your application so that you will never accidentally run-time-link against the wrong file.
  • Uninstall the Intel SDK, assuming it's providing opencl.lib/opencl.dll files that are specific to Intel.

Some more relevant questions on libs and dlls:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!