Divide and Conquer

Divide and conquer refers to a class of algorithmic techniques in which one breaks the input into several parts, solves the problem in each part recursively, and then combines the solutions to these subproblems into an overall solution.

The Mergesort Algorithm

Mergesort sorts a given list of numbers by first dividing them into two equal halves, sorting each half separately by recursion, and then combining the results. The input has been reduced to size 2.

Let T(n)T(n) denote the worst-case running time on input instance of size nn.

  • The algorithm takes O(n)O(n) time to split the inputs into two halves.

  • The algorithm takes T(n/2)T(n / 2) to solve each half.

  • The algorithm takes O(n)O(n) to combine the solutions.

Recurrence relation: T(n)2T(n/2)+cn=2T(n/2)+O(n)T(n) \leq 2T(n / 2) + cn = 2T(n / 2) + O(n) for n>2n > 2, T(2)cT(2) \leq c for some constant cc.

Solving Recurrences

  • Unrolling: Find the running time of the first few levels and identify a patten that can be continued as the recursion expands.

  • Substituting: Guess for a solution and substitute it into the recurrence relation to check whether it works with an argument by induction on $ n $.

Unrolling the Mergesort Recurrences

  1. Analyzing the first few levels: The third level has four problems each of size n/4n / 4, each taking time at most cn/4cn/4, for a total of at most cncn.

  2. Identifying a pattern: The level ii contributes a total of at most 2i(cn/2i)2^i (cn/2^i) to the total running time.

  3. Summing over all levels: The number of levels is log2nlog_2 n. Therefore, the function T()T(\cdot) is bounded by O(nlogn)O(nlogn) when n>1n > 1.

Substituting a Solution

Suppose the relation T(n)cnlog2nT(n) \leq cn \log_2 n satisfies the recurrence relation.

  • Base step: T(2)cnlog2n=2cT(2) \leq cn \log_2 n = 2c.

  • Inductive step: T(n)2c(n/2)log2(n/2)+cn=cn[(log2n)1]+cn=cnlog2nT(n) \leq 2c(n/2) \log_2 (n/2) + cn = cn[(\log_2 n) - 1] + cn = cn \log_2 n. Therefore, the function T()T(\cdot) is bounded by O(nlogn)O(nlogn) when n>1n > 1.

Further Recurrence Relations

The general form of the divide-and-conquer algorithm is creating qq subproblems of size n/2n/2 each, and combine the results in O(n)O(n) time. For some constant cc, T(n)qT(n/2)+cnT(n) \leq qT(n/2) + cn when n>2n > 2, and T(2)cT(2) \leq c.

The Case of q > 2 Subproblems

  1. The first few levels: The first level takes at most cncn time. The second level takes at most $(q/2)cn(q/2) cn time. The third level takes at most (q2/4)cn(q^2/4) cn time.

  2. Identifying a pattern: At level ii, the total work performed is (qi/2i)cn(q^i/2^i) cn.

  3. Summing over all levels: T(n)cnj=0log2n1(q2)j(cr1)nlog2q=O(nlog2q)T(n) \leq cn \sum_{j = 0}^{\log_{2} n - 1} (\frac{q}{2})^j \leq (\frac{c}{r-1}) n^{\log_{2}q} = O(n^{\log_{2} q}).

Therefore, any function T()T(\cdot) satisfying the recurrence relation with q>2q > 2 is bounded by O(nlog2q)O(n^{\log_{2} q}).

The Case of q = 1 Subproblem

  1. The first few levels: The first level takes at most cncn time. The second level takes at most $cn/2cn / 2 time. The third level takes at most cn/4cn / 4 time.

  2. Identifying a pattern: At level ii, the total work performed is cn/2icn / 2^i.

  3. Summing over all levels: T(n)cnj=0log2n112j2cn=O(n)T(n) \leq cn \sum_{j = 0}^{\log_{2} n - 1} \frac {1}{2^j} \leq 2cn = O(n).

Therefore, any function T()T(\cdot) satisfying the recurrence relation with q=1q = 1 is bounded by O(n)O(n).

Consider a variant form of divide-and-conquer algorithm: Divide the input into two pieces of equal time, solve them by recursion, and combine the resutls in quadratic time. For some constant cc, T(n)qT(n/2)+cn2T(n) \leq qT(n/2) + cn^2 when n>2n > 2, and T(2)cT(2) \leq c.

  1. The first few levels: The first level takes at most cn2cn^2 time. The second level takes at most $cn2/2cn^2 / 2 time. The third level takes at most cn2/4cn^2 / 4 time.

  2. Identifying a pattern: At level ii, the total work performed is cn2/2icn^2 / 2^i.

  3. Summing over all levels: T(n)cn2j=0log2n112j2cn2=O(n2)T(n) \leq cn^2 \sum_{j = 0}^{\log_{2} n - 1} \frac {1}{2^j} \leq 2cn^2 = O(n^2).

The Closest Pair of Points

Given nn points in the plane, find the pair that is closest together.

The Algorithm

Let the set of points be P=p1,,pnP = {p_1, \dots, p_n}, where pip_i has coordinats (xi,yi)(x_i, y_i). The Euclidan distance between two points is d(pi,pj)d(p_i, p_j). The algorithm holds the assumption that no two points in PP have the same x-coordinate or the same y-coordinate.

The Recursion

Let PPP' \subseteq P be the input of each recursion. Let the list PxP'_x be all the points in PP' that have been sorted by increasing x-coordinate, PyP'_y be all the points in PP' that have been sorted by increasing y-coordinate.

For each level of recursion, QQ is defined as the set of points in the first n/2\lceil n / 2 \rceil positions of PxP'_x and RR is defined as the set of points in the final n/2\lfloor n / 2 \rfloor positions of the list PxP_x.

By a single pass through PxP'_x and PyP'_y, four lists could be produced: QxQ_x. QyQ_y, RxR_x, RyR_y. Suppose that the closest pairs of points in QQ and RR are and , and .

Combine the Solutions

Let α\alpha be the minimum of and . Let LL be the vertical line that crosses through the rightmost point of $ Q $.

If there exists qQq \in Q and $ r \in R forwhichfor which d(q, r) < \alpha ,theneachof, then each of q andand r lieswithinadistancelies within a distance \alpha ofof L $$.

Let SPS \subseteq P' be the set of points in this narrow band and SyS_y be the points in SS that sorted with their y-coordinates. Thus, there exist q inQq \ in Q and rRr \in R for which d(q,r)<αd(q, r) < \alpha if and only if there exists s,sSs, s' \in S for which d(s,s)<αd(s, s') < \alpha.

If s,sSs, s' \in S for which d(s,s)<αd(s, s') < \alpha, then ss and ss' are within 15 positions of each other in the sorted list SyS_y. Therefore, for each point sSys \in S_y, the algorithm computes its distance to each of the next 15 points in SyS_y. If there's a pair of points for which d(s,s)<αd(s, s') < \alpha, they are the closest pair of points in PP'. Otherwise, the closet pair is the minimum of and .

Analyzing the Algorithm

The algorithm correctly outputs a closest pair of points in P. Since the algorithm divides the input into left and right halves, and merges the solutions in linear time, it satisties the basic recurrence and the time complexity is O(nlogn)O(nlogn).

  • Basic step: When P3|P| \leq 3, the closest pair could be computed by brute-force.

  • Induction step: Assume that the remainder of the algorithm correctly determines the closest pair of a given input, then the closest pair in PP either has both elements in one of QQ or RR, or it has one element in each.

Last updated