Sorting variable-length items
a. You are given an array of integers, where different integers may have different numbers of digits, but the total number of digits over all the integers in the array is n. Show how to sort the array in O(n) time.
b. You are given an array of strings, where different strings may have different numbers of characters, but the total number of characters over all the strings is n. Show how to sort the strings in O(n) time.
(Note that the desired order here is the standard alphabetical order; for example, a < ab < b.)
a. The radix sort can be used to sort this problem. It would take d passes to sort the numbers in the array which will be equal to the number with highest number of digits. Suppose there m integers $m \leq n$ is always the case the worst case occurs when one integer has n/2 digits and remaining integers have 1 digit each. we assume the range of the single digit is constant. Therefore we would have d = n/2 and m = n/2+1 so the runnning time is $\theta(mn)$ = $\theta(n^2)$
If we assume the numbers will not have leading zeros and all the numbers are positive (if we have negative numbers we have sort them separately from positive numbers)then the number with more digits will always be greater than number with less digits. we can sort the numbers by the no of digits (using counting sort) and then sort each group of numbers with same length using radix sort. Each integer has i = 1..n digits let $m_i$ be the number of integers with i digits since there n digits we have $\sum_{i=1}^n i m_i$ = n.
It takes $O(n)$ to find the no of digits for all the numbers(divide the number 10 until it > 0 and base case for 0) and it will take O(m+n) = O(m) to group numbers with no of digits. To sort the group with $m_i$ number of integers with radix sort it will take $O(i . m_i)$. The time to sort the all groups is therefore $\sum_{i = 1}^n \theta(i . m_i)$ = $ \theta( \sum_{i = 1}^n i . m_i)$ = $\theta(n)$
b. One way to solve this problem is by a radix sort from right to left. Since the strings might have unequal lengths. we may have to pad all the strings smaller than largest string on the right side with character less than any other character. But the issue is we have to change the strings. Also the time required to sort will not run in the required time bound. In the worst case we have a string with n/2 characters and remaining n/2 string have one character. we have d = n/2 and m = n/2+1 the time required to sort using radix sort is $\theta(dm)$ = $\theta(n/2 (n/2+1))$ = $\theta(n^2)$.
we can sort the string based on the first character if the first character of string is lexicographically greater than first character other string then the length doesnt matter.We take advantage of this property and sort strings based on the first character using counting sort. We take empty string as special case and put it first. we gather all the string with same first letter as group.Then we recurse within each group with first letter removed. Running time: Lets count no of times each string is sorted by counting sort. Let $s_i$ be a string to be sorted has a length $l_i$. Then the no of times this string is sorted is $l_i+1$. (the +1 is for the string to be sorted as an empty string). ex. if we have to sort ab,a we need two pass in second pass we need to compare b with empty string for a.because of unequal no of characters. A call to counting sort for t string takes $\theta(t)$ time. The total time for all calls for counting sort is: $O( \sum_{i=1}^m l_i + 1 )$ = $O( \sum_{i=1}^m l_i + m )$
= $O( n + m )$ $m \leq n$
No comments:
Post a Comment