8-1 Probabilistic lower bounds on comparison sorting In this problem, we prove a probabilistic \Omega(n lg n) lower bound on the running time of any deterministic or randomized comparison sort on n distinct input elements. We begin by examining a deterministic comparison sort A with decision tree TA. We assume that every permutation of A’s inputs is equally likely.
a. Suppose that each leaf of TA is labeled with the probability that it is reached given a random input. Prove that exactly n! leaves are labeled 1/n! and that the rest are labeled 0.
b. Let D(T) denote the external path length of a decision tree T ; that is, D(T) is the sum of the depths of all the leaves of T. Let T be a decision tree with k > 1 leaves, and let LT and RT be the left and right subtrees of T . Show that D(T) = D(LT) + D(RT) + k.
c. Let d(k) be the minimum value of D(T) over all decision trees T with k > 1 leaves. Show that $d(k) = min_{1 \leq i \leq k-1}{d(i)+ d(k - i) +k}.$ (Hint: Consider a decision tree T with k leaves that achieves the minimum. Let i0 be the number of leaves in LT and $k - i_0$ the number of leaves in RT.)
d. Prove that for a given value of k > 1 and i in the range 1 i k 1, the function i lg i + (k - i) lg(k - i) is minimized at i = k/2. Conclude that $d(k) = \Omega(k lg k)$.
e. Prove that $D(T_A) = \Omega(n! lg(n!))$, and conclude that the average-case time to sort n elements is $\Omega(n lg n)$.
a. since there n elements we have n! permutation of inputs being leaves of a decision tree. Each of leaf has a probability 1/n!. Since we have taken in to consideration all possible permutations there are no leaves hence the probability of other leaves is zero.
b. $ D(T) = \sum{l = 1^k}D(l)$
let x be the no of leaves on left sub tree
since the depth for each leaf from the left and right is 1 less than depth for the main tree
$$D(L(T)) = \sum_{l = 1}^ {x} (D(l) - 1 ) $$ $$D(R(T)) = \sum_{l = x+1}^{k} (D(l)- 1) $$ $$D(L(T))+D(R(T)) = \sum_{l = 1}^ {x} (D(l) - 1) + \sum_{l = x+1}^{k} (D(l)- 1)$$ $$D(L(T))+D(R(T)) = \sum_{l = 1}^k D(l) - \sum_{l = 1}^ {x} - \sum_{l = x+1}^{k}$$ $$D(L(T))+D(R(T)) = \sum_{l = 1}^k D(l) - x - (k -x)$$ $$D(L(T))+D(R(T)) = \sum_{l = 1}^k D(l) - x -k +x $$ $$D(L(T))+D(R(T)) = D(T) -k $$ $$D(T) = D(L(T))+D(R(T))+k$$ c. Needs more research. d. To find a where a minimum value we need to see the derivative of the function. $$f(i) = (i lg i + (k-i) lg(k-i))/ lg 2 $$ $$f'(i) = (lg i + 1 - log(k-i)-(k-i)/(k-i) )/ lg 2$$ $$f'(i) = (lg i + 1 - k log(k-i)-1)/ lg 2 $$ $$f'(i) = lg i - log(k-i) = 0 $$ the above equation is zero when i = k/2. To make sure above value is a minimum the second derivative of above should greater than 0. $$f''(i) = 1/i + 1 /k-i = 0 $$ $$f''(k/2) = 2/k + 2 / k = 0 $$ since k > 1 the above equation is greater than zero. Hence k/2 is minimum. we can use substitution method to solve the recursion. base case: $$f(k) = c k log k$$ $$f(1) = c 1 log 1 = 0$$ for base case when k = 1 $$f(1) \geq 0 $$ Similarly for inductive step $$f(i) = (i lg i + (k-i) lg(k-i)) $$ Since minimum for the equation occurs at k/2 we can use to find the lower bound for i = 1 to k -1 $$f(i) = min_{i=1 to k-1} (i lg i + (k-i) lg(k-i)+k) $$ $$f(k/2) = c (k/2 lg k/2 + (k-k/2) lg(k-k/2))+k$$ $$f(k/2) = c (k/2 lg k/2 + k/2 lg(k/2))+k $$ $$f(k/2) = c (k/2 lg k/2 + k/2 lg(k/2))+k$$ $$f(k/2) = c (k lg k/2 )+ k$$ $$f(k/2) = c (k lg k - log 2 k ) + k$$ $$f(k/2) = c (k lg k - k ) + k$$ $$f(k/2) = c (k lg k) - c k + k$$ $$f(k/2) = c (k lg k) - (c k - k)$$ c k - k > 0 if c > 1 then $f(k) =\Omega(k log k)$ e. since the tree has n! leaves substituting n! in eq below $$d(k) = \Omega(k log k)$$ $$d(n!) =D(T_A) = \Omega(n! log n!)$$ since n! permutations have an equal probability of 1/n!. The average case running time for n elements(one permutation. since there will be one permutation for an input). $$D(T_A) = \Omega(n! log n!)/n!$$ $$D(T_A) = \Omega( log n!)$$ $$D(T_A) = \Omega( n log n)$$ f. If we draw a randomized decision tree it will have all possible paths for different permutations of the input. since we are talking about average running time at each randomized node we need to pick smallest subtree(the tree with smallest average no of comparisons that will lead us to the leaf). We need to delete the other children of randomized node and consider the smallest subtree. This will work because this path worked for randomized sort already.The average number of comparisons for the modiÞed algorithm is no larger than the average number for the original randomized tree, since we discarded the higher-average subtrees in each case. In particular, each time we splice out a randomized node, we leave the overall average less than or equal to what it was, because
• the same set of input permutations reaches the modified subtree as before, but those inputs are handled in less than or equal to average time than before, and
• the rest of the tree is unmodified.
The randomized algorithm thus takes at least as much time on average as the corresponding deterministic one. (Weve shown that the expected running time for a deterministic comparison sort is $\Omega(n lg n)$, hence the expected time for a randomized comparison sort is also $\Omega(n lg n)$
No comments:
Post a Comment