问题
I am using TI's Keystone II which has ARM as host and 8 accelerator DSP cores. These DSP cores don't talk to each other as they do not have any shared memory with them.
I am getting this strange issue that I am unable to rewrite into this 'cum' array in which I am computing the cumulative frequency. I am only able to read whatever I wrote to it the first time. The writes after that are not registered. Any solutions to this issue?
The device has a Unified Memory architecture. Also 'cum' and 'frequency' are of 'CL_MEM_READ_WRITE' type.
This code snippet runs on the DSP cores
...
//upscan
for(i=0; i < 32; i++)
{
if(pid<4)
{
localvar1 = frequency[(i*8)+(2*pid)];
localvar2 = frequency[(i*8)+(2*pid)+1];
cum[(i*8)+(2*pid)+1] = localvar1 + localvar2;
}
}
for(i=0; i < 32; i++)
{
if(pid<2)
{
localvar1 = cum[(i*8)+(4*pid)+3];
localvar2 = cum[(i*8)+(4*pid)+1];
cum[(i*8)+(4*pid)+3] = localvar1 + localvar2;
}
}
for(i=0; i < 32; i++)
{
if(pid<1)
{
localvar1 = cum[(i*8)+(pid)+7];
localvar2 = cum[(i*8)+(pid)+3];
cum[(i*8)+(pid)+7] = localvar1 + localvar2;
}
}
...
回答1:
use a barrier or mem_fence between your for-loops, the exact flags choice depends on the type of memory you're using (global, local) and device specific details but a barrier should solve your problem.
来源:https://stackoverflow.com/questions/29887169/opencl-device-memory-read-write-issue