I have one searching OpenCL 1.1 algorithm which works well with small amount of data:
1.) build the inputData array and pass it to the GPU
2.) c
I had a similiar problem regarding variable problem sizes. One way could be to simply implement a divide-and-conquer approach and to split up your data on the host. You could process your data blocks one after the other on the device.
BTW: you are sure about the comparison
while (lastPosition **>** RESULT_BUFFER_SIZE)