问题
I'm trying to fill OpenCL cl_int2
buffer with default values ({-1, -2}
), however the OpenCL function clEnqueueFillBuffer()
fills my buffer with different values each time I run it – the buffer is filled with the expected values only at random. The function returns error code 0
.
Examples of the snippet's output at multiple runs:
0 : -268435456
0 : -2147483648
0 : -536870912
0 : 268435456
0 : 0
0 : -1342177280
-1: -2
I'm running OS X 10.11.6 with Radeon HD 6750M and OpenCL version 1.2.
clbParticle_hashmap_lookup_table = clCreateBuffer(context,
CL_MEM_READ_WRITE,
sizeof(cl_int2)*this->CUBE_CELLS,
nullptr,
&err_code);
// ...
cl_int2 default_hashmap_pattern = { .s = {-1, -2} };
clEnqueueFillBuffer(queue,
clbParticle_hashmap_lookup_table,
&default_hashmap_pattern,
sizeof(cl_int2),
0,
sizeof(cl_int2)*this->CUBE_CELLS,
0,
nullptr, nullptr);
clFinish(queue);
// copy and print the data:
size_t hashmap_lookup_table_size = sizeof(cl_int2)*this->CUBE_CELLS;
cl_int2* hashmap_lookup_table_bytes = (cl_int2*) malloc(hashmap_lookup_table_size);
clEnqueueReadBuffer(queue,
clbParticle_hashmap_lookup_table,
CL_TRUE,
0,
hashmap_lookup_table_size,
hashmap_lookup_table_bytes,
0,
nullptr, nullptr);
clFinish(queue);
cout << endl << "Lookup table: " << endl;
for (int i=0; i<this->CUBE_CELLS; i++)
cout << setw(10) << hashmap_lookup_table_bytes[i].s[0] << " : "
<< setw(10) << hashmap_lookup_table_bytes[i].s[1] << endl;
回答1:
The problem is that your fill pattern is larger too large for your GPU. I ran into the same problem trying to fill a pattern with a cl_double
which is 64 bits like your cl_int2
. I think clEnqueueFillBuffer
is invoking a built in kernel which doesn't allow patterns
回答2:
I can reproduce this. On a Macbook Sierra, with Radeon Pro 450, following script:
int N = 100000;
float *a = new float[N];
cl_mem a_gpu = clCreateBuffer(context, CL_MEM_READ_WRITE, N * sizeof(float), 0, &err);
checkError(err);
for(int it = 0; it < 100; it++) {
float value = 123.0f + it;
err = clEnqueueFillBuffer(queue, a_gpu, &value, sizeof(value), 0, N * sizeof(float), 0, 0, 0);
checkError(err);
clFinish(queue);
err = clEnqueueReadBuffer(queue, a_gpu, CL_TRUE, 0,
sizeof(cl_float) * N, a, 0, NULL, NULL);
checkError(err);
clFinish(queue);
cout << it << " a[N - 1]=" << a[N - 1] << endl;
}
delete[] a;
gives results like:
Using Apple , OpenCL platform: Apple
Using OpenCL device: AMD Radeon Pro 450 Compute Engine
0 a[N - 1]=-1.39445e-31
1 a[N - 1]=0
2 a[N - 1]=0
3 a[N - 1]=0
4 a[N - 1]=0
5 a[N - 1]=0
6 a[N - 1]=129
7 a[N - 1]=0
8 a[N - 1]=131
9 a[N - 1]=132
10 a[N - 1]=133
11 a[N - 1]=134
12 a[N - 1]=135
13 a[N - 1]=0
14 a[N - 1]=0
15 a[N - 1]=0
16 a[N - 1]=0
17 a[N - 1]=0
18 a[N - 1]=0
19 a[N - 1]=0
20 a[N - 1]=0
21 a[N - 1]=0
22 a[N - 1]=0
23 a[N - 1]=0
24 a[N - 1]=0
25 a[N - 1]=0
26 a[N - 1]=0
27 a[N - 1]=0
28 a[N - 1]=0
29 a[N - 1]=0
30 a[N - 1]=0
31 a[N - 1]=154
32 a[N - 1]=0
回答3:
I have experienced this bug ONLY on macOS, since Mar 2017 when I started to learn OpenCL (can't remember the macOS version at then). The GPU is GT 750M (which is probably irrelevant), and the pattern
is a cl_double2
. Same routine on a GTX 760, Linux, has no such problem. I suspect this is because the OpenCL 1.2 support on macOS is incomplete, as clinfo
(compiled and executed on macOS) warns:
NOTE: your OpenCL library only supports OpenCL 1.0,
but some installed platforms support OpenCL 1.2.
Programs using 1.2 features may crash
or behave unexpectedly
The "corresponding" CUDA API, cudaMemset
, can only accept an int
-sized pattern. However, the restriction is stated in the CUDA documentation, while the OpenCL documentation clearly used a cl_float4
(same size to cl_double2
) as an example. So this is clearly a bug, not an undocumented feature.
But I guess Apple has solved this problem in macOS 10.14, because THEY ARE DEPRECATING OPENCL!
来源:https://stackoverflow.com/questions/38556710/clenqueuefillbuffer-fills-a-buffer-correctly-only-at-random