问题
Consider the following code:
#include <time.h> // --- time
#include <stdlib.h> // --- srand, rand
#include<fstream>
#include <thrust\host_vector.h>
#include <thrust\device_vector.h>
#include <thrust\sort.h>
#include <thrust\iterator\zip_iterator.h>
#include "TimingGPU.cuh"
/********/
/* MAIN */
/********/
int main() {
const int N = 16384;
std::ifstream h_indices_File, h_x_File;
h_indices_File.open("h_indices.txt");
h_x_File.open("h_x.txt");
std::ofstream h_x_result_File;
h_x_result_File.open("h_x_result.txt");
thrust::host_vector<int> h_indices(N);
thrust::host_vector<double> h_x(N);
thrust::host_vector<double> h_sorted(N);
for (int k = 0; k < N; k++) {
h_indices_File >> h_indices[k];
h_x_File >> h_x[k];
}
thrust::device_vector<int> d_indices(h_indices);
thrust::device_vector<double> d_x(h_x);
thrust::gather(d_indices.begin(), d_indices.end(), d_x.begin(), d_x.begin());
h_x = d_x;
for (int k = 0; k < N; k++) h_x_result_File << h_x[k] << "\n";
//thrust::device_vector<double> d_x_sorted(N);
//thrust::gather(d_indices.begin(), d_indices.end(), d_x.begin(), d_x_sorted.begin());
//h_x = d_x_sorted;
//for (int k = 0; k < N; k++) h_x_result_File << h_x[k] << "\n";
}
The code loads from file an array of indices h_indices.txt
and a double
array h_x.txt
. Then, it transfers those arrays to the GPU to d_indices
and d_x
and uses thrust::gather
to achieve Matlab's equivalent
d_x(d_indices)
The two txt files can be downloaded from h_indices.txt and h_x.txt. The code creates an output result file h_x_result.txt
.
If I use the "in-place" version of thrust::gather
(the last uncommented three lines of the code), then I obtain that the result is different from d_x(d_indices)
, while if I use the not "in-place" version (the last commented three lines of the code), then the result is correct.
In Matlab, I'm using
load h_indices.txt; load h_x.txt; load h_x_result.txt
plot(h_x(h_indices + 1)); hold on; plot(h_x_result, 'r'); hold off
The "in-place" case returns the following comparison
On the other side, the "in-place" case returns
I'm using Windows 10, CUDA 8.0, Visual Studio 2013, compiling in Release Mode and running on an NVIDIA GTX 960 cc. 5.2.
回答1:
Thrust gather
can't be used in place.
But I would go as far as to suggest that no "naïve" gather operation can be safely performed in-place, and that the Matlab snippet you presented as in-place (presumably d_x = d_x(d_indices)
) isn't an in-place operation at all.
来源:https://stackoverflow.com/questions/42560237/can-thrustgather-be-used-in-place