Calculating PI with Fortran & CUDA

試著忘記壹切 提交于 2019-12-08 13:44:58

问题


I am trying to make a simple program in PGI's fortran compiler. This simple program will use the graphics card to calculate pi using the "dart board" algorithm. After battling with this program for quite some time now I have finally got it to behave for the most part. However, I am currently stuck on passing back the results properly. I must say, this is a rather tricky program to debug since I can no longer shove any print statements into the subroutine. This program currently returns all zeros. I am not really sure what is going on, but I have two ideas. Both of which I am not sure how to fix:

  1. The CUDA kernel is not running somehow?
  2. I am not converting the values properly? pi_parts = pi_parts_d

Well, this is the status of my current program. All variables with _d on the end stand for the CUDA prepared device memory where all the other variables (with the exception of the CUDA kernel) are typical Fortran CPU prepared variables. Now there are some print statements I have commented out that I have already tried out from CPU Fortran land. These commands were to check if I really was generating the random numbers properly. As for the CUDA method, I have currently commented out the calculations and replaced z to statically equal to 1 just to see something happen.

module calcPi
contains
    attributes(global) subroutine pi_darts(x, y, results, N)
        use cudafor
        implicit none
        integer :: id
        integer, value :: N
        real, dimension(N) :: x, y, results
        real :: z

        id = (blockIdx%x-1)*blockDim%x + threadIdx%x

        if (id .lt. N) then
            ! SQRT NOT NEEDED, SQRT(1) === 1
            ! Anything above and below 1 would stay the same even with the applied
            ! sqrt function. Therefore using the sqrt function wastes GPU time.
            z = 1.0
            !z = x(id)*x(id)+y(id)*y(id)
            !if (z .lt. 1.0) then
            !   z = 1.0
            !else
            !   z = 0.0
            !endif
            results(id) = z
        endif
    end subroutine pi_darts
end module calcPi

program final_project
    use calcPi
    use cudafor
    implicit none
    integer, parameter :: N = 400
    integer :: i
    real, dimension(N) :: x, y, pi_parts
    real, dimension(N), device :: x_d, y_d, pi_parts_d
    type(dim3) :: grid, tBlock

    ! Initialize the random number generaters seed
    call random_seed()

    ! Make sure we initialize the parts with 0
    pi_parts = 0

    ! Prepare the random numbers (These cannot be generated from inside the
    ! cuda kernel)
    call random_number(x)
    call random_number(y)

    !write(*,*) x, y

    ! Convert the random numbers into graphics card memory land!
    x_d = x
    y_d = y
    pi_parts_d = pi_parts

    ! For the cuda kernel
    tBlock = dim3(256,1,1)
    grid = dim3((N/tBlock%x)+1,1,1)

    ! Start the cuda kernel
    call pi_darts<<<grid, tblock>>>(x_d, y_d, pi_parts_d, N)

    ! Transform the results into CPU Memory
    pi_parts = pi_parts_d
    write(*,*) pi_parts

    write(*,*) 'PI: ', 4.0*sum(pi_parts)/N
end program final_project

EDIT TO CODE: Changed various lines to reflect the fixes mentioned by: Robert Crovella. Current status: error caught by cuda-memcheck revealing: Program hit error 8 on CUDA API call to cudaLaunch on my machine.

If there is any method I can use to test this program please let me know. I am throwing darts and seeing where they land for my current style of debugging with CUDA. Not the most ideal, but it will have to do until I find another way.

May the Fortran Gods have mercy on my soul at this dark hour.


回答1:


When I compile and run your program I get a segfault. This is due to the last parameter you are passing to the kernel (N_d):

call pi_darts<<<grid, tblock>>>(x_d, y_d, pi_parts_d, N_d)

Since N is a scalar quantity, the kernel is expecting to use it directly, rather than as a pointer. So when you pass a pointer to device data (N_d), the process of setting up the kernel generates a seg fault (in host code!) as it attempts to access the value N, which should be passed directly as:

call pi_darts<<<grid, tblock>>>(x_d, y_d, pi_parts_d, N)

When I make that change to the code you have posted, I then get actual printed output (instead of a seg fault), which is an array of ones and zeroes (256 ones, followed by 144 zeroes, for a total of N=400 values), followed by the calculated PI value (which happens to be 2.56 in this case (4*256/400), since you have made the kernel basically a dummy kernel).

This line of code is also probably not doing what you want:

grid = dim3(N/tBlock%x,1,1)

With N = 400 and tBlock%x = 256 (from previous code lines), the result of the calculation is 1 (ie. grid ends up at (1,1,1) which amounts to one threadblock). But you really want to launch 2 threadblocks, so as to cover the entire range of your data set (N = 400 elements). There's a number of ways to fix this, but for simplicity let's just always add 1 to the calculation:

grid = dim3((N/tBlock%x)+1,1,1)

Under these circumstances, when we launch grids that are larger (in terms of total threads) than our data set size (512 threads but only 400 data elements in this example) it's customary to put a thread check near the beginning of our kernel (in this case, after the initialization of id), to prevent out-of-bounds accesses, like so:

if (id .lt. N) then

(and a corresponding endif at the very end of the kernel code) This way, only the threads that correspond to actual valid data are allowed to do any work.

With the above changes, your code should be essentially functional, and you should be able to revert your kernel code to the proper statements and start to get an estimate of PI.

Note that you can check the CUDA API for error return codes, and you can also run your code with cuda-memcheck to get an idea of whether the kernel is making out-of-bounds accesses. Niether of these would have helped with this particular seg fault, however.



来源:https://stackoverflow.com/questions/23330979/calculating-pi-with-fortran-cuda

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!