I have a user-space application that generates big SCSI writes (details below). However, when I\'m looking at the SCSI commands that reach the SCSI target (i.e. the storage,
The blame is indeed on the block layer, the SCSI layer itself has little regard to the size. You should check though that the underlying layers are indeed able to pass your request, especially with regard to direct io since that may be split into many small pages and requires a scatter-gather list that is longer than what can be supported by the hardware or even just the drivers (libata is/was somewhat limited).
You should look and tune /sys/class/block/$DEV/queue there are assorted files there and the most likely to match what you need is max_sectors_kb but you can just try it out and see what works for you. You may also need to tune the partitions variables as well.