I have an array of unsigned integers stored on the GPU with CUDA (typically 1000000 elements). I would like to count the occurrence of every number in the array
As others have said, you can use the sort & reduce_by_key approach to count frequencies. In my case, I needed to get mode of an array (maximum frequency/occurrence) so here is my solution:
1 - First, we create two new arrays, one containing a copy of input data and another filled with ones to later reduce it (sum):
// Input: [1 3 3 3 2 2 3]
// *(Temp) dev_keys: [1 3 3 3 2 2 3]
// *(Temp) dev_ones: [1 1 1 1 1 1 1]
// Copy input data
thrust::device_vector dev_keys(myptr, myptr+size);
// Fill an array with ones
thrust::fill(dev_ones.begin(), dev_ones.end(), 1);
2 - Then, we sort the keys since the reduce_by_key function needs the array to be sorted.
// Sort keys (see below why)
thrust::sort(dev_keys.begin(), dev_keys.end());
3 - Later, we create two output vectors, for the (unique) keys and their frequencies:
thrust::device_vector output_keys(N);
thrust::device_vector output_freqs(N);
4 - Finally, we perform the reduction by key:
// Reduce contiguous keys: [1 3 3 3 2 2 3] => [1 3 2 1] Vs. [1 3 3 3 3 2 2] => [1 4 2]
thrust::pair::iterator, thrust::device_vector::iterator> new_end;
new_end = thrust::reduce_by_key(dev_keys.begin(), dev_keys.end(), dev_ones.begin(), output_keys.begin(), output_freqs.begin());
5 - ...and if we want, we can get the most frequent element
// Get most frequent element
// Get index of the maximum frequency
int num_keys = new_end.first - output_keys.begin();
thrust::device_vector::iterator iter = thrust::max_element(output_freqs.begin(), output_freqs.begin() + num_keys);
unsigned int index = iter - output_freqs.begin();
int most_frequent_key = output_keys[index];
int most_frequent_val = output_freqs[index]; // Frequencies