Sorting -MergeSort

MergeSort

Mergesort is one of the beautiful examples of recursion.
It runs in O(NlogN) time.
The fundamental operation involved in this algorithm is merging 2 sorted lists.This can be done in one pass through the 2 lists if the output is a third list.

Conceptually, merge sort works as follows:

1. Divide the unsorted list into two sublists of about half the size
2. Divide each of the two sublists recursively until we have list sizes of length 1, in which case the list itself is returned
3. Merge the two sorted sublists back into one sorted list.

The crucial step determining the complexity of the algorithm is the merge .

In merge,we start of with 2 sorted sub arrays A,B and an output array C in to which the sorted union of A,B has to be copied.

Let Aptr,Bptr and Cptr be the 3 counters set to the beginning of the resprective arrays. we go on retrieving the smallest element of A[Aptr] , B[Bptr] and copy it to C[Cptr] and advance the counter Cptr as well as the counter of the array from which the element is copied to C.

This process of retrieval takes place as long as both the arrays A,B are unfinished.
When one of them is finished ,the remaining portion of the second array is copied into C.

Pseudo Code


Procedure MergeSort (Array(First..Last))
Begin

If Array contains only one element Then
Return Array
Else
Middle= ((Last + First)/2) rounded down to the nearest integer
LeftHalfArray = MergeSort(Array(First..Middle))
RightHalfArray = MergeSort(Array(Middle+1..Last))
ResultArray = Merge(LeftHalfArray, RightHalfArray)
Return ResultArray
EndIf

End MergeSort

Procedure Merge (LeftHalfArray(LHFirst..LHLast), RightHalfArray(RHFirst..RHLast))
Begin

Result: Array(ResultFirst..ResultLast) of size (LHLast-LHFirst+1+RHLast-RHFirst+1)
LeftPointer = LHFirst
RightPointer = RHFirst
ResultPointer = ResultFirst
Loop
If LeftHalfArray(LeftPointer) <= RightHalfArray(RightPointer)
Then

Result(ResultPointer) = LeftHalfArray(LeftPointer)
LeftPointer = LeftPointer + 1
ResultPointer = ResultPointer + 1
Else
Result(ResultPointer) = RightHalfArray(RightPointer)
RightPointer = RightPointer + 1
ResultPointer = ResultPointer + 1
Until all elements in either LeftHalfArray or RightHalfArray have been moved to
Result.If all elements in the LeftHalfArray have been moved to Result

Then

Move remaining elements in RightHalfArray to Result

Else

Move remaining elements in LeftHalfArray to Result

End If

Return Result End Merge


Analysis of MergeSort

The time to mergesort N numbers is the time to mergesort 2 subarrays of size N/2 and merge them.

As we knew ,Merge takes at the maximum N-1 comparisons hence linear.

Hence T(N)=2*T(N/2)+N .

Solving the above recursion ,we get the time complexity to be O(NlogN).

Having looked at this brief though detailed analysis on Mergesort, We shall look at some questions which throw further light on the behaviour of the mergesort.

Questions

1) Determine the running time of mergesort for
a. sorted input
b.reverse-ordered input
c.random input


Solution:One would be surprised to see that the complexity of this algorithm doesn't change according to that of input.One can clearly see that the merge is the only phase affected by the ordering of the input.In this case,logN iterations would anyway occur in dividing phase of divide and conquer strategy.
a.Sorted Input:In merge phase, only the left pointer sweeps down its array while the right remains unmoved.Hence the number of comparisons reduces to N/2 instead of N.So the algorithm is still of order O(N*logN).

b.reverse-ordered input:This is similar to the earlier one.In this case the right portion gets copied first.Hence the order doesn't change.

c.random input:This is the normal case we have analysed the complexity for :).So obviously true.




2)Prove that the minimum number of comparisons used in mergesort in the worst case is N*floor(LogN)-2^floor(logN) +1(don't ignore the constants)

Solution:In the worst case the no of comparisons used to merge 2 arrays making up for a union of size N is N-1.So the equations looks like this.
T(N)=2*T(N/2)+N-1 (considering only comparisons)
T(N)=4*T(N/4)+N-1 +N-2
.
.
.
T(N)=2^k T(N/2^k) +N-1 +N-2 +.......+N-2^(k-1)

Hence T(N)=2^floor(logN) T(1) + N*floor(logN)-(1+2+.....+2^(k-1))

T(1)=0 since no comparisons are involved for an array of size 1.

Hence T(N)=N*floor(logN)-2^floor(logN) +1





3)You are given with three sorted arrays ( in ascending order), you are required to find a triplet ( one element from each array) such that distance is minimum.
Distance is defined like this :
If a[i], b[j] and c[k] are three elements then

distance=max(abs(a[i]-b[j]),abs(a[i]-c[k]),abs(b[j]-c[k]))

Please give a solution in O(n) time complexity.


Solution: Point to the first elements of the three arrays, namely a[0],b[0],c[0].
Find the smallest and second smallest of the three.Let us say that a[0] is the smallest and b[0] is the second smallest. Increment the pointer of a until you find a[i]>b[0]. Calculate the difference between a[i-1] and c[0] and store it as current min. Now,again find the smallest and second smallest between a[i], b[0], and c[0] and repeat the above process. If the new difference is smaller than current min,update the value of current min.
Repeat the above process until one of the arrays are finished.


Post your answers in Comments Section .The answers for these questions shall be posted soon.

1 comment: