Difference between local allocatable and automatic arrays

前端 未结 2 1616
眼角桃花
眼角桃花 2020-12-17 03:05

I am interested in the difference between alloc_array and automatic_array in the following extract:

subroutine mysub(n)
integer, in         


        
相关标签:
2条回答
  • 2020-12-17 03:33

    Because gfortran or ifort + Linux(x86_64) are among the most popular combinations used for HPC, I made some performance comparison between local allocatable vs automatic arrays for these combinations. The CPU used is Xeon E5-2650 v2@2.60GHz, and the compilers are gfortran4.8.2 and ifort14.0. The test program is like the following.

    In test.f90:
    
    !------------------------------------------------------------------------           
    subroutine use_automatic( n )
        integer :: n
    
        integer :: a( n )   !! local automatic array (with unknown size at compile-time)
        integer :: i
    
        do i = 1, n
            a( i ) = i
        enddo
    
        call sub( a )
    end
    
    !------------------------------------------------------------------------           
    subroutine use_alloc( n )
        integer :: n
    
        integer, allocatable :: a( : )  !! local allocatable array                      
        integer :: i
    
        allocate( a( n ) )
    
        do i = 1, n
            a( i ) = i
        enddo
    
        call sub( a )
    
        deallocate( a )  !! not necessary for modern Fortran but for clarity                  
    end
    
    !------------------------------------------------------------------------           
    program main
        implicit none
        integer :: i, nsizemax, nsize, nloop, foo
        common /dummy/ foo
    
        nloop = 10**7
        nsizemax = 10
    
        do i = 1, nloop
            nsize = mod( i, nsizemax ) + 1
    
            call use_automatic( nsize )
            ! call use_alloc( nsize )                                                   
        enddo
    
        print *, "foo = ", foo   !! to check if sub() is really called
    end
    
    In sub.f90:
    
    !------------------------------------------------------------------------
    subroutine sub( a )
        integer a( * )
        integer foo
        common /dummy/ foo
    
        foo = a( 1 )
    ends
    

    In the above program, I tried avoiding compiler optimization that eliminates a(:) itself (i.e., no operation) by placing sub() in a different file and making the interface implicit. First, I compiled the program using gfortran as

    gfortran -O3 test.f90 sub.f90
    

    and tested different values of nsizemax while keeping nloop = 10^7. The result is in the following table (time is in sec, measured several times by the time command).

    nsizemax    use_automatic()    use_alloc()
    10          0.30               0.31               # average result
    50          0.48               0.47
    500         1.0                0.90
    5000        4.3                4.2
    100000      75.6               75.7
    

    So the overall timing seems almost the same for two calls when -O3 is used (but see Edit for different options). Next, I compiled with ifort as

    [O3]  ifort -O3 test.f90 sub.f90
    or
    [O3h] ifort -O3 -heap-arrays test.f90 sub.f90
    

    In the former case the automatic array is stored on the stack, while when -heap-arrays is attached the array is stored on the heap. The obtained result is

             use_automatic()    use_alloc()
             [O3]    [O3h]      [O3]    [O3h]
    10       0.064   0.39       0.48    0.48
    50       0.094   0.56       0.65    0.66
    500      0.45    1.03       1.12    1.12
    5000     3.8     4.4        4.4     4.4
    100000   74.5    75.3       76.5    75.5
    

    So for ifort, the use of automatic arrays seems beneficial when relatively small arrays are mainly used. On the other hand, gfortran -O3 shows no difference because both arrays are treated the same way (see Edit for more details).

    Additional comparison:

    Below is the result for Oracle Fortran compiler 12.4 for Linux (used with f90 -O3). The overall trend seems similar; automatic arrays are faster for small n, indicating the internal use of stack.

    nsizemax    use_automatic()    use_alloc()
    10          0.16               0.45
    50          0.17               0.62
    500         0.37               0.97
    5000        2.04               2.67
    100000      65.6               65.7
    

    Edit

    Thanks to Vladimir's comment, it has turned out that gfortran -O3 put automatic arrays (with unknown size at compile-time) on the heap. This explains why use_automatic() and use_alloc() did not make any difference above. So I made another comparison between different options below:

    [O3]  gfortran -O3
    [O5]  gfortran -O5
    [O3s] gfortran -O3 -fstack-arrays
    [Of]  gfortran -Ofast                   # this includes -fstack-arrays
    

    Here, -fstack-arrays means that the compiler puts all local arrays with unknown size on the stack. Note that this flag is enabled by default with -Ofast. The obtained result is

    nsizemax    use_automatic()               use_alloc()
                [Of]   [O3s]  [O5]  [O3]     [Of]  [O3s]  [O5]  [O3]
    10          0.087  0.087  0.29  0.29     0.29  0.29   0.29  0.29
    50          0.15   0.15   0.43  0.43     0.45  0.44   0.44  0.45
    500         0.57   0.56   0.84  0.84     0.92  0.92   0.92  0.92
    5000        3.9    3.9    4.1   4.1      4.2   4.2    4.2   4.2
    100000      75.1   75.0   75.6  75.6     75.6  75.3   75.7  76.0
    

    where the average of ten measurements are shown. This table demonstrates that if -fstack-arrays is included, the execution time for small n becomes shorter. This trend is consistent with the results obtained for ifort above.

    It should be mentioned, however, that the above comparison probably corresponds to the "best-case" scenario that highlights the difference between them, so the timing difference can be much smaller in practice. For example, I have compared the timing for the above options by using some other program (involving both small and large arrays), and the results were not much affected by the stack options. Also the result should depend on machine architecture as well as compilers, of course. So your mileage may vary.

    0 讨论(0)
  • 2020-12-17 03:37

    For the sake of clarity, I'll briefly mention terminology. Of the two arrays both are local variables and arrays of rank 1.

    • alloc_array is an allocatable array;
    • automatic_array is an explicit-shape automatic object.

    Again as in the linked question, after the allocation statement both arrays are of size n. I'll answer here in that these are still two very different things. Of course, the allocatable array can have its allocation status changed and its allocation moved. I'll leave both of those (mostly) out of the scope of this answer. An allocatable array, of course, needn't have these things changed once it's defined.

    Memory usage

    What was partly contentious about a previous revision of the question is how ill-defined the concept of memory usage is. Fortran, as a standard, tells us that both arrays come to be the same size and they'll have the same storage layout, and are both contiguous. Beyond that, much follows terms you'll hear a lot: implementation specific, processor dependent.

    In a comment you expressed interest in ifort. So that I don't wander too far, I'll just stick with that one compiler and prompt of what to consider.

    Often, ifort will place automatic objects and array temporaries onto stack. There is a (default) compiler option -no-heap-arrays described as having effect

    The compiler puts automatic arrays and temporary arrays in the stack storage area.

    Using the alternative option -heap-arrays allows one to control that slightly:

    This option puts automatic arrays and arrays created for temporary computations on the heap instead of the stack.

    There is a possibility to control size thresholds for which heap/stack would be chosen (when that is known at compile-time):

    If the compiler cannot determine the size at compile time, it always puts the automatic array on the heap.

    As n isn't a constant expression, then, one would expect your array automatic_array to be on the heap with this option, regardless of the size specified.

    There's probably more to be said, but this could be far too long if I tried.

    Interface needs

    There is nothing special about the interface requirements of the subroutine mysub: local variables have no impact on that. Any program unit calling that would be happy with an implicit interface. What you are asking about is how the two local arrays can be used.

    This largely comes down to what uses the two arrays can be put to.

    If the dummy argument of a second procedure has the allocatable attribute then only the allocatable array here can be passed to that procedure. It will also need to have an explicit interface. This is true whether or not the procedure changes the allocation.

    Of course, both arrays could be passed as arguments to a dummy argument without the allocatable attribute and then we don't have different interface requirements.

    Anyway, why would one want to pass an argument to an allocatable dummy when there will be no change in allocation status, etc.? There are good reasons:

    • there may be a code path in the procedure which does have an allocation change (controlled by a switch, say);
    • allocatable dummy arguments also pass bounds;
    • etc.,

    This second one is more obvious if the subroutine had specification

    subroutine mysub(n)
    integer, intent(in)  :: n
    integer              :: automatic_array(2:n+1)
    integer, allocatable :: alloc_array(:)
    
    allocate(alloc_array(2:n+1))
    

    Finally, an automatic object has quite strict conditions on its size. n here is clearly allowed, but things don't have to be much more complicated before allocation is the only plausible way. Depending on how much one wants to play with block constructs.

    Taking also a comment from IanH: if we have a very large n the automatic object is likely to lead to crash-and-burn. With the allocatable, one could use the stat= option to come to some amicable agreement with the compiler run-time.

    0 讨论(0)
提交回复
热议问题