What im doing is rendering a number of bitmaps to a single bitmap. There could be hundreds of images and the bitmap being rendered to could be over 1000x1000 pixels.
I've done something similar and in my case I had each thread lock x (depended on the size of the image and the number of threads) many rows of bits in the image, and do their writing to those bits such that no threads ever overlapped their writes.