发表新帖

发表新帖

Confusion about different running times of two algorithms in C [duplicate]

后端未结

关注

 6  833

清酒与你 2020-12-14 06:59

6条回答

旧时难觅i (楼主)

2020-12-14 07:46
On a machine with data cache (even a 68030 has one), reading/writing data in consecutive memory locations is way faster, because a block of memory (size depends on the processor) is fetched once from memory and then recalled from the cache (read operation) or written all at once (cache flush for write operation).

By "skipping" data (reading far from the previous read), the CPU has to read the memory again.

That's why your first snippet is faster.

For more complex operations (fast fourier transform for instance), where data is read more than once (unlike your example) a lot of libraries (FFTW for instance) propose to use a stride to accomodate your data organization (in rows/in columns). Never use it, always transpose your data first and use a stride of 1, it will be faster than trying to do it without transposition.

To make sure your data is consecutive, never use 2D notation. First position your data in the selected row and set a pointer to the start of the row, then use an inner loop on that row.
```
for (i=0; i < ROWS; i++) {
    const long *row = m[i];
    for (j=0; j < COLS; j++) {
        sum += row[j];
    }
}
```
If you cannot do this, that means that your data is wrongly oriented.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题