*external sorting*. It is an open ended question which is interactive.

Let's assume that -

- The list of numbers is on disk (as we cannot fit them all in RAM).

- There is a sort function which sorts the given list effectively i.e. we do not have to worry about a particular sorting algorithm.

- There is lots of disk space.
- As we are short on RAM, the time complexity is not a constraint. This is an important one.

*I would*:

- divide the list into manageable sizes (in this case, it would be 250,000 numbers of 1MB).

- sort the list, save it on the disk and continue with other list.

- then do an m-way merge on the sorted lists.

**m-way (or n-way) merge**:

If you have m lists, and every one is

**sorted**, then from the beginning of the lists pick the smallest from one of the list, fetch the next element in that

*same*list. Continue till all elements from all m lists are exhausted. Just like the 2-way merge in merge sort.

This particular m-way merge step is quite inefficient and for m lists containing n/m elements each, it results in O(m*n) time. This is where I realized that time complexity is not important. However, we can maintain a min-heap of the

*front*elements of the m lists and can achieve a time of O(logm) for selecting the smallest - delete the smallest element from the min-heap, insert the next element into the min-heap, each operation takes O(logm) time. So the m-way (n-way) merge for m lists and n elements would take O(logm *n) time. Also, we can read a chunk of size (n/m*m) for each list into memory and so at any time, we have total of n/m elements in memory.

[Hat tip to TN]

time complexity is not important???? loool you should kill yourself :) it's all about time complexity my friend

ReplyDelete