问题
I'm trying to parallelize a little scientific code I wrote.  But when I add @parallelize, similar code on just one processor suddenly takes 10 times as long to execute.  It should take roughly the same amount of time.  The first code makes one memory allocation, while the second makes 20.  But zeros(Float64, num_bins) should not be a bottleneck.  num_bins is 1800.  So each call to zeros() should be allocating 8*1800 bytes.  20 calls to allocate 14,400 bytes should not be taking this long.
I can't figure out what I'm doing wrong, and the Julia documentation is vague and non-specific about how variables are accessed within @parallel.  Both versions of the code below compute the correct value for the rdf vector.  Can anyone tell by looking at it what is making it allocate so much memory and take so long?
atoms = readAtoms(file)
rdf = zeros(Float64, num_bins)
@time for k = 1:20
    for i = 1:num_atoms
        for j = 1:num_atoms
            r = distance(k, atoms, i, atoms, j)
            bin_number = floor(r / dr) + 1
            rdf[bin_number] += 1
        end
    end
end
elapsed time: 8.1 seconds (0 bytes allocated)
atoms = readAtoms(file)
@time rdf = @parallel (+) for k = 1:20
    rdf_part = zeros(Float64, num_bins)
    for i = 1:num_atoms
        for j = 1:num_atoms
            r = distance(k, atoms, i, atoms, j)
            bin_number = floor(r / dr) + 1
            rdf_part[bin_number] += 1
        end
    end
    rdf_part
end
elapsed time: 81.2 seconds (33472513332 bytes allocated, 17.40% gc time)
来源:https://stackoverflow.com/questions/27296176/julia-allocates-huge-amount-of-memory-for-unknown-reason