What im doing is rendering a number of bitmaps to a single bitmap. There could be hundreds of images and the bitmap being rendered to could be over 1000x1000 pixels.
You could have each thread write to a byte array, then when they are all finished, use a single thread to create a bitmap object from the byte arrays. If all other processing has been done before hand, that should be pretty quick.