script

Monday, March 4, 2013

9.3-8


Professor Olay is consulting for an oil company, which is planning a large pipeline running east to west through an oil field of n wells. The company wants to connect a spur pipeline from each well directly to the main pipeline along a shortest route (either north or south), as shown in Figure 9.2. Given the x- and y-coordinates of the wells, how should the professor pick the optimal location of the main pipeline, which would be the one that minimizes the total length of the spurs? Show how to determine the optimal location in linear time.
In order to find optimal placement we only need to find medians of y-coordinates. Claim:
The optimal y-cordinate is as follows:
a) If n is even then the optimal placement of the pipeline is either on the oil well whose y-coordinate is lower median or on oil well whose y-coordinate is upper median or anywhere between them.
b) If n odd then the optimal placement of pipeline is on the oil well whose y-coordinate is the median.
Proof:
We examine various cases in each we start out with the pipeline at a particular y-coordinate and see what happens when we move it. We denote s as the sum of north and south spurs and s' denotes the sum after we move the pipeline.
If n is even and we start the pipeline on or somewhere between oil wells whose y-coordinate are the lower median and upper median. if we move the pipeline by a vertical distance d with out crossing either of the the median wells, then n/2 of the oil wells will be d distance farther from the pipeline and n/2 of the oil wells will nearer to the pipeline. If we caculate the distance s' = s+nd/2 - nd/2 then s' = s. thus all the locations on or between the two medians is good.
Now suppose if the pipeline goes through the upper median oil well and if we increase the y-coordinate of the pipeline d- distance away(upwards) from upper median. Then there are n/2+1 oil wells farther from pipeline and n/2-1 oil wells nearer to oil well. If we calculate the distance $$s' \geq s+(n/2+1)d-(n/2-1)d $$ $$s' \geq s+2d$$ clearly s' is greater than s.we conclude that if we move the pipeline away from the upper median then it increases the total spur length. similarly if we move the pipeline away below lower median the spur length increases.
when n is even the optimal placement of pipeline is on or somewhere between oil wells whose y-coordinates are upper and lower medians.
Now lets consider n is odd if we start the pipeline at oil well whose y-coordinate is the median and move the pipeline y-coordinate by d distance. Then there are n+1/2 oil well farther from the pipeline and n-1/2 oil wells nearer to the pipeline. if we calculate the distance $$s' \geq s+ (n+1/2)d -(n-1/2)d $$ $$s' \geq s+ d $$ which greater than s. A symmetrical argument shows that moving the y-cordinate of pipeline below the median also increases the spur length. If n is odd then the optimal placement for the pipeline is the median. We use the linear-time-median algorithm.

9.3-9


Let X{1..n] and Y[1..n] be two arrays, each containing n numbers already in sorted order. Give an O(lg n)-time algorithm to find the median of all 2n elements in arrays X and Y .
If X is an array that has 1..n and let median m occurs at X[k]. Then there k elements less than equal m and n-k elements greater than equal m.If we combine both arrays X and Y the no of elements is 2n. If m is a median after combining two arrays it should have n to its left and n elements to the right including m. So for the left side to have n there n-k less than or equal to m(let x be no of elements in y which less than or equal to m n = k+x then x = n-k ). similarly for the right side k elements greater than or equal to m(let x be no of elements in y which greater than or equal to m n = n-k+x). If we can an in equality. $$Y[n-k] \leq X[k] \leq Y[n-k+1]$$ But we need to also consider when k = n. Then Y[0] will have any elements and $X[n] \leq Y[1]$. If the median k' > k then X[k] will not satisfy the above equation. Then $$X[k] \lt Y[n-k]$$. Similarly if we have K' < k then X[k] > Y[n-k+1]. Similarly the median can occur in X or Y. If it occurs Y array the comparisions will have the array names reversed(replace x with Y and Y with X).We can use above analysis to write the algorithm. We can use binary search to find the median.

FIND-TWO-ARRAY-MEDIAN(X,Y)
n = A.length
median = FIND-MEDIAN(X,Y,1,n,n)
If median not found
FIND-MEDIAN(Y,X,1,n,n)
return median
FIND-MEDIAN(X,Y,low,high,n)
If low < high
k = (low+high)/2
If k = n and X[n] = Y[1] then
return X[n]
else if $Y[n-k] \leq X[k] \leq Y[n-k+1]$
return X[k]
else if $X[k] > Y[n-k+1]$
FIND-MEDIAN(X,Y,low,k-1,n)
else
FIND-MEDIAN(X,Y,k+1,high,n)

Sunday, March 3, 2013

9.3-7


Describe an O(n)-time algorithm that, given a set S of n distinct numbers and a positive integer $k \leq n$, determines the k numbers in S that are closest to the median of S.
Assume for simplicity that n is odd and k is even. If the set S was in sorted order, the median is in position n=2 and the k numbers in S that closest to the median are in positions (n − k)/2 through (n + k)/2. We first use linear time selection to fi nd the (n − k)/2, n/2, and (n + k)/2 elements and then pass through the set S to fi nd the numbers less than (n+k)/2 element, greater than the (n−k)/2 element, and not equal to the n/2 element. The algorithm takes O(n) time as we use linear time selection exactly three times and traverse the n numbers in S once(Use for loop to do comparisons to find element > (n-k)/2 and less than (n+k)/2 but not equal to n/2 ).

9.3-6


The kth quantiles of an n-element set are the k 1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O(n lg k)-time algorithm to list the kth quantiles of a set. The k quantiles of an n-element sorted array A are: A[\floor 1 n/k \rfloor],A[\lfloor 2 · n/k \rfloor], . . . ,A[\lfloor(k − 1) · n/k\rfloor]. We have as inputs an unsorted array A of distinct keys, an integer k, and an empty array Q of length k − 1 and we want to find the kth quantiles of A. QUANTILES(A, k,Q)
1. if k = 1 then return
2. else
3. n = length[A]
4. i = $\lfloor k/2 \rfloor$
5. x = $ SELECT (A, \lfloor i n/k \rfloor)$
6. PARTITION (A, x)
7. Add to list Q: QUANTILES(A[1]..A[i· n/k], \lfoor k/2 \rfloor, Q)
8. Add to list Q: QUANTILES(A[i·n/k + 1]..A[n],\lceil k/2 \rceil, Q)
9. return x
The output Q of the recursive algorithm QUANTILES contains the kth quantiles of A. The algorithm first finds the key that turns out to be the lower median of Q, and then partitions the input array about this key. So we have two smaller arrays that each contain at most (k−1)/2 of the k−1 order statistics of the original array A. At each level of the recursion the number of order statistics in the array is at most half the number of the previous array. Consider a recurrsion tree for this algorithm. At the top level we need to find k − 1 order statistics, and it costs O(n) to find one. The root has two children, one contains at most $\lfoor(k−1)/2\rfloor$ order statistics, and the other $\lceil (k−1)/2 \rceil$ order statistics. The sum of the costs for these two nodes is O(n). At depth i we find $2^i$ order statistics. The sum of the costs of all nodes at depth i is O(n), for $0 \leq i \leq log_{2} (k − 1)$, because the total number of elelements at any depth is n. The depth of the tree is $d = log_{2}(k − 1)$. Hence, the worstcase running time of QAUNTILES is $ \theta(n lg k)$.

Saturday, March 2, 2013

9.3-5


Suppose that you have a “black-box” worst-case linear-time median subroutine. Give a simple, linear-time algorithm that solves the selection problem for an arbitrary order statistic.
we are given a median subroutine that takes A ,p and r as parameters and returns the median value of A[p..r] in O(n) time.Given the median below is the SELECT' algorithm for finding ith smallest element. The algorithm uses deterministic partitioning that is modified to take element to partition around as input parameter.
SELECT'(A,p,r)
if p == r then
return A[p]
x = MEDIAN(A,p,r)
q = PARTITION(A,x)
k = q-p+1
if i == k then
return a[q]
if i > k
SELECT'(A,q+1,r,i-k)
else
SELECT'(A,p,q-1,i)
Because x is the median of A[p..q] and it partition the array in to A[p..q-1] and A[q+1..r] of n/2 sizes. we have the recurrence $$T(n) \leq T(n/2)+O(n)$$ the above recurrence is master method case T(n) = O(n).

9.3-4


Suppose that an algorithm uses only comparisons to find the i th smallest element in a set of n elements. Show that it can also find the i - 1 smaller elements and the n - i larger elements without performing any additional comparisons.


Suppose that the algorithm uses m comparisons to find the ith smallest element x in a set of n elements. If we trace these m comparisons in the log, we can easily find the i - 1 smaller elements by transitivity. If x > a and a > b, then x > b can be deduced without actually performing a comparison between x and b. Similarly, the n - i larger elements can be found, too.

Say we can not decide that some element e is smaller or greater than x. Then we claim that the algorithm does not work properly, since an adversary could design an input in which e is the ith largest element instead of x. The i -1 and n -i smaller elements may be not in sorted order with respect to each but they are in sorted order with respect to the ith smallest element. What I mean we get i -1 and n-i smaller elements but it does'nt any thing about the relative order of the smaller or larger elements it means they are just smaller or larger than i.

9.3-3


Show how quicksort can be made to run in O(n lg n) time in the worst case, assuming that all elements are distinct.
The worst case for quick sort occurs when the split creates 0 and n-1 elements. $T(n) = T(n-1)+\theta(n)$. But if we select a median that gives us n/2 and n/2 split always then we can eliminate the worst case.
BEST-CASE QUICKSORT(A,p,r)
if p < r then $$i \leftarrow (r-p+1)/2 $$ $$ x \leftarrow SELECT(A,i)$$ $$ q \leftarrow PARTITION(A,x)$$ BEST-CASE QUICKSORT(A,p,q-1)
BEST-CASE QUICKSORT(A,q+1,r)
Because the BEST-CASE QUICKSORT recurses only on elements of size at most n/2(for even n n/2 and n/2 - 1 and for odd n n/2 and n/2). $$T(n) \leq 2 T(n/2) + \theta(n)$$ The running time for above recursion is similar to merge sort $O(n log n()$

9.3-2


Analyze SELECT to show that if n 140, then at least $\lceil n/4 \rcieil elements are greater than the median-of-medians x and at least $\lceil n/4 \rcieil elements are less than x. if we draw a 5 X 5 grid. we have 3n/10 greater than x and 3n/10 elements are less than x. 3 140/10 = 42 n/4 = 140/4 = 35

9.3-1

In the algorithm SELECT, the input elements are divided into groups of 5. Will the algorithm work in linear time if they are divided into groups of 7? Argue that SELECT does not run in linear time if groups of 3 are used.

for groups of we can. if we can draw the 7 x 7 grid figure to represent the elements. we know there are 4 elements below the median that are greater median. hence the no of elements greater than the median of medians is $4(n/14 - 2)$ = $2n/7 - 8$. So if we partition the array as in step 4 using x(median of medians) we will be left with $n - (2n/7 - 8) $ = $5n/7 + 8 $
we can use the same recursive equation as for the groups of 5 and plugin values
$$T(n) \leq T(n/7)+ T(5n/7+8)+\theta(n) $$ let T(n) = cn $$T(n) = cn/7 + 5cn/7+8c+\theta(n) $$ $$T(n) = 6cn/7+8c+\theta(n) $$ $$T(n) = cn/7 - (cn/7-8c-an) $$ $$cn/7-8c-an \geq 0$$ $$cn/7-8c \geq an$$ $$c( n-56/7) \geq an$$ $$c \geq 7an/ ( n-56)$$ if n> 56 then the above recurrence is satisfied for the value of c described as above. Similarly for groups 3. if we draw 3 x 3 grid. 2(n/6-2) = n/3-4. The no of elements left after partitioning are n - (n/3-4) = 2n/3+4. $$T(n) \leq T(n/3)+ T(2n/3+4)+\theta(n) $$ $T(n) = cn$ $$T(n) = cn/3+ 2cn/3+4c)+\theta(n) $$ $$T(n) = cn/3+ 2cn/3+4c+\theta(n) $$ $$T(n) = cn+4c+\theta(n) $$ $$T(n) = cn +4c+\theta(n) $$ the above recurrence is greater than cn since we have the residual elements($4c+\theta(n)$).
we can prove the lower bound on T(n) to see the minimum value for the recurrence.The lower bound occurs when we the even split where we get 3 x 3 grid elements and no column with elements less than 3. The no of elements greater than median of medians x is 2(n/6-1)+ 1. (The -1 is for median of medians group and also there is one element in the median of medians that is greater than median of medians).The no of elements left after step 4 is n - (2(n/6-1)+1)= n - n/3+1 = 2n/3+1. The recurrence is $$T(n) \geq T(n/3)+ T(2n/3+1)+\theta(n) $$ $T(n) = cn log n$ $$T(n) = cn logn/3+ c(2n log n/3+1)+\theta(n) $$ $$T(n) = cn logn/3+ 2cn log n/3+c+\theta(n) $$ $$T(n) = cn logn+c+\theta(n) $$ $$T(n) = cn logn+c+\theta(n) $$ The above recurrence is at least cn logn.

selection in worst case linear time


The selection in worst case uses insertion sort to sort in step 2. but you would think insertion sort worst case running time is $O(n^2)$.but we are only sorting groups of 5 elements we know insertion sort is linear time for smaller values of n/. remember it beats megesort for smaller values of n. hence the step is linear time. Also dont use selection in worst case linear time because in practice it has large constants and it gets reduced by 19/20 n. it takes a while to be really small. use random selection in practice.