How to implement Structures of Arrays instead of Arrays of Structures in Fortran?

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-02 19:11:08

问题


I'm writing my code on CFD topic with Fortran. After discussing with some friends from computer science, they told me that one could speed up the computation time if one implements Structures of Arrays (SoA) instead of Arrays of Structures (AoS) on his/her code.

There are many examples I've seen about this topic's implementation, but most of them are in C or C++. (e.g. https://software.intel.com/en-us/articles/how-to-manipulate-data-structure-to-optimize-memory-use-on-32-bit-intel-architecture).

Could anyone show me or guide me some basic ideas or examples how to implement SoA instead of AoS in Fortran?


回答1:


There is really nothing difficult about this concept.

Instead of

type struct
  real x, y, z
end type

type(struct), allocatable :: array(:)

you use

type(struct2)
  real, dimension(:), allocatable :: x, y, z
end ype

type(struct2) :: arrays

It is really just a line by line translation of a C or C++ example. More or less everything what you can read about this topic is still applicable to Fortran even if they use other language for their examples.

Actually, in the old days Fortran didn't have any structures and the most natural way how to do stuff was just to declare variables:

real x(bigN)
real y(bigN)
real z(bigN)

and you get all those performance benefits of structures of arrays this way too. It sounds almost strange to a Fortranner that someone knows only arrays of structures.




回答2:


The (100000,5) works just as good because the 100000 are contiguous. I would probably say so explicitly, I.e.:

!DIR$ ATTRIBUTES ALIGN:64                   :: A
REAL DIMENSION(:.:), ALLOCATABlE, CONTIGUOUS:: A

But the structure also works. Depends on what you prefer. It seems more intuitive ( to me ) when the 5 is a larger number to parallelise on the j (5) loop, and vectorise on the contiguous I (100000) loop. Or do task, or other !$OMP workshare approaches.

Really you do not need a structure or array if A,B,C,D,E are used instead of 1-5...

"The Why" is that the contiguous data can be vectorized because it is an array. when it is an array of structures you bound by 5 from value to the next, and lose time in either gathers, or not being able to have it vectorized, or both.

Then On your I-loop (100000) you either auto-vectorise, use OMP, or use a CilkPlus looking statement A(:) = ... Sometimes the vector notation is just as fast as OMP and auto vectorised, and sometimes it is slower than OMP. You need to try it both (3) ways, and the OMP usually is solid, but a few more lines and less readable. It should run a lot faster as SoA



来源:https://stackoverflow.com/questions/38461099/how-to-implement-structures-of-arrays-instead-of-arrays-of-structures-in-fortran

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!