Just "buffer by buffer", copy files in binary mode and read/write X bytes long parts. I think that fastest solution is to just use copy function of C language itself or system call.
Largest buffer will provide you less HDD find for data operations (faster copying) but more RAM usage.