I tried answering this using external sort, but interviewer replied that the complexity was to high n.n(log(n)) i.e. n square *logn. Is there a better alternative.
T
The standard way of doing it is an External Sort.
In external sort - it is not only important to have O(nlogn) comlexity - it is also critical to minimize as much as possible the disk reads/writes, and make the most reads and writes sequential (and not random) - since disk access is much more efficient when done sequentially.
The standard way of doing so is indeed a k-way merge sort, as suggsested by @JanDvorak, but there are some faults and addition to the suggestion I am aiming to correct:
k - the "order" of the merge is M/(2b) (where M is the size of your memory, and b is the size of each "buffer" (usually disk block).b entries from each "run" generated in previous iteration - filling M/2 in the memory. The rest of the memory is for "prediction" (which allows continious work with minimal wait for IO) - requesting more elements from a a run, and for the output buffer - in order to guarantee sequential right in blocks.log_k(N/(2M)) where k is the number of runs (previously calculated), M is the size of the memory, and N is the size of the file. Each iteration requires 1 sequential read and 1 sequential write of the entire file.That said - the ratio of file_size/memory_size is usually MUCH more then 10. If you are interested only in a ratio of 10, a local optimizations might take place, but it is not for the more common case where file_size/memory_size >> 10