memory-access

Memory coalescing and nvprof results on NVIDIA Pascal

北城余情 提交于 2021-02-08 10:16:31
问题 I am running a memory coalescing experiment on Pascal and getting unexpected nvprof results. I have one kernel that copies 4 GB of floats from one array to another one. nvprof reports confusing numbers for gld_transactions_per_request and gst_transactions_per_request . I ran the experiment on a TITAN Xp and a GeForce GTX 1080 TI. Same results. #include <stdio.h> #include <cstdint> #include <assert.h> #define N 1ULL*1024*1024*1024 #define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__);

From non coalesced access to coalesced memory access CUDA

霸气de小男生 提交于 2021-01-28 18:53:36
问题 I was wondering if there is any simple way to transform a non-coalesced memory access into a coalesced one. Let's take the example of this array: dW[[w0,w1,w2][w3,w4,w5][w6,w7][w8,w9]] Now, i know that if Thread 0 in block 0 access dW[0] and then Thread 1 in block 0 access dw[1] , that's a coalesced access in the global memory. The problem is that i have two operations. The first one is coalesced as described above. But the second one isn't because Thread 1 in block 0 needs to do an operation

Provide AoS access to SoA

风流意气都作罢 提交于 2019-12-25 08:31:24
问题 I have data laid out in memory in a Structure of Arrays (SoA) or Sturcture of Pointers (SoP) form, and have a way to access that data as though it were laid out in Array of Structure (AoS) form -- code given below. However, I am not too happy about use of struct AoS_4_SoP -- although this struct appears to use templates, it is not really generic since, for example, foo and bar are hard-coded inside it. Two questions/requests: 1) For read-write performance, is AoS access provided as good as

Unaligned memory access: is it defined behavior or not? [duplicate]

十年热恋 提交于 2019-12-23 14:23:41
问题 This question already has an answer here : What does the standard say about unaligned memory access? (1 answer) Closed last year . Consider the following code: #include <iostream> int main() { char* c = new char('a'); char ac[4] = {'a', 'b', 'c', 'd'}; unsigned long long int* u = reinterpret_cast<unsigned long long int*>(c); unsigned long long int* uc = reinterpret_cast<unsigned long long int*>(&ac[3]); *u = 42; *uc = 42; std::cout<<*u<<" "<<*uc<<std::endl; } Is this considered as a valid

Unaligned memory access: is it defined behavior or not? [duplicate]

偶尔善良 提交于 2019-12-23 14:21:06
问题 This question already has an answer here : What does the standard say about unaligned memory access? (1 answer) Closed last year . Consider the following code: #include <iostream> int main() { char* c = new char('a'); char ac[4] = {'a', 'b', 'c', 'd'}; unsigned long long int* u = reinterpret_cast<unsigned long long int*>(c); unsigned long long int* uc = reinterpret_cast<unsigned long long int*>(&ac[3]); *u = 42; *uc = 42; std::cout<<*u<<" "<<*uc<<std::endl; } Is this considered as a valid

Why am I getting this memory access error 'double free or corruption'?

感情迁移 提交于 2019-12-20 10:46:09
问题 I am getting the following type of error. I know it has something to do with me improperly accessing memory, but I don't exactly how. Please help me see where I have gone wrong. *note I have simplified my function and it is not obvious what the variables are doing, I just need to know how I am implementing the function incorrectly or where I am misusing memory access. int my_function(char const *file_name, size_t max) { myStruct.pStore = fopen(file_name,"w+"); //pStore is a FILE* myStruct.max

Memory permission error before entering main thread [closed]

不想你离开。 提交于 2019-12-12 06:46:51
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 8 days ago . I'm getting Memory permission error before entering main thread. After debugging, the error either in this class here: private: motor LFWheel = motor(PORT10,gearSetting::ratio18_1,false); motor LBWheel = motor(PORT2,gearSetting::ratio18_1,false); motor RFWheel = motor(PORT9,gearSetting::ratio18_1,true); motor

Dynamic array with Frama-C and Eva

情到浓时终转凉″ 提交于 2019-12-11 06:59:07
问题 In https://stackoverflow.com/a/57116260/946226 I learned how to verify that a function foo that operates on a buffer (given by a begin and end pointer) really only reads form it, but creating a representative main function that calls it: #include <stddef.h> #define N 100 char test[N]; extern char *foo(char *, char *); int main() { char* beg, *end; beg = &test[0]; end = &test[0] + N; foo(beg, end); } but this does not catch bugs that only appear when the buffer is very short. I tried the

Object and struct member access and address offset calculation

假如想象 提交于 2019-12-11 02:24:40
问题 I am writing a simple VM and I have a question on implementing object and structure member access. Since the begin address of a program is arbitrary on each run, and subsequently the address of each and every of its objects is arbitrary too. Thus the only way I can think of to access an object or its member object is by accessing an offset from the "base" pointer, which means there is an arithmetic operation needed to access anything in a program structure. My question is whether this is the