Greedy Algorithms

An algorithm is greedy if it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion.

Interval Scheduling

The Interval Scheduling Problem consists of finding the compatible set of maximum size from a set of requests, in which each request has a starting time fif_i and finishing time fif_i.

Interval Scheduling Algorithm

To design a greedy algorithm, the basic idea is to use a simple rule to select the first request i1i_1. Once a request i1i_1 is accepted, all requests that are not compatible with i1i_1 are rejected. The algorithm terminates when all requests are either accepted or rejected.

  • (Incorrect) Accept the request that starts earliest.

  • (Incorrect) Accept the request that requires the smallest interval of time.

  • (Incorrect) Accept the request that has the fewest number of non-compatible requests.

  • (Optimal) Accept the request that finishes first.

Analyzing the Algorithm

The set A returned by the algorithm is a compatible set of requests. Let O be an optimal set of intervals. Let i1,…,iki_1, \dots, i_k be the set of requests in A, and j1,…,jmj_1, \dots, j_m be the set of requests in O. If ∣A∣=∣O∣|A| = |O|, A is optimal.

Induction: For the first request, our greedy rule guarantees that f(i1)≤f(j1)f(i_1) \leq f(j_1). For r>1r > 1, the induction hypothesis assume that f(irāˆ’1)≤f(jrāˆ’1)f(i_{r-1}) \leq f(j_{r-1}). Since f(jrāˆ’1)≤s(jr)f(j_{r-1}) \leq s(j_r), f(irāˆ’1)≤s(jr)f(i_{r-1}) \leq s(j_r). Therefore, the interval jrj_r is in the set of available intervals when the greedy algorithm selects iri_r, thus f(ir)≤f(jr)f(i_r) \leq f(j_r).

For each rr, the rthr^{th} interval finishes at least as soon as the rthr^{th} interval in O. Since the algorithm terminates when there's no possible requests, ∣A∣=∣O∣|A| = |O| and A is optimal.

Implementation and Running Time

def eraseOverlapIntervals(intervals):
  intervals.sort(key=lambda x:x[1])
  last = float('-inf')
  result = []
  for start, end in intervals:
    if start >= last:
      last = end
      result.append((start, end))
  return result

The sorting operation takes O(nlogn)O(nlogn) time. The single-pass iteration takes O(n)O(n) time. Therefore, the time complexity of the interval scheduling algorithm is O(nlogn)O(nlogn).

Interval Partitioning

The Interval Partitioning Problem consists of scheduling all requests using as few resources as possible given a set of requests and many identical resources.

In any instance of interval partitioning, the number of resources needed is at least the depth of the set of intervals.

Interval Partitioning Algorithm

Let d be the depth of the set of intervals. The algorithm assigns a label 1,2,…,d{1, 2, \dots, d} to each interval, and overlapping intervals have different labels.

The algorithm iterates through the intervals and tries to assign to each interval a label that hasn't been assigned to any previous interval that overlaps it.

Analyzing the Algorithm

The greedy algorithm above will assign a label to each interval, and no two overlapping intervals will receive the same label. The number of resources in the result equals the depth of the set of intervals, which is the optimal number of resources needed.

Shortest Paths

For a directed graph G=(V,E)G = (V, E) with a designated start node ss, ss has a path to every other node in GG. Each edge ee has a length le≤0l_e \leq 0, indicating the time or distance to traverse ee. l(P)l(P) is the sum of the lengths of all edges in PP. The goal is to determine the shortest path from ss to every other node in the graph.

Dijkstra's Algorithm

The algorithm maintains a set SS of vertices uu for which we have determined a shortest-path distance d(u)d(u) from ss. Initially S=s,d(s)=0S = {s}, d(s) = 0. For each node v∈Vāˆ’Sv \in V - S, we determine the shortest path that can be constructed by traveling along a path through SS to u∈Su \in S, followed by the single edge (u,v)(u, v). The node vv that will minimize the distance is added to vv, and d(v)=d(u)+led(v) = d(u) + l_e.

To construct the shortest path from ss to vv, we could start at vv and recursively follow the edge we stored for vv in the reverse direction to uu until we reach ss.

Analyzing the Algorithm

Consider the set SS at any point in the algorithm's execution. For each u∈Su \in S, the path PuP_u is the shortest sāˆ’us-u path.

Induction: The case ∣S∣=1|S| = 1 is trivial since d(s)=0d(s) = 0. Suppose the claim holds when ∣S∣=k|S| = k for k≄1k \geq 1. By induction hypothesis, PuP_u is the shortest sāˆ’us-u path for each u∈Su \in S. In the iteration k+1k + 1, the node vv is added to SS by the algorithm. If PP is another path from ss to vv through a node xx other than vv, it can't be shorter than PvP_v. Otherwise, xx will be added by the algorithm instead of vv.

  • The algorithm doesn't always find the shortest paths if some of the edges can have negative lengths.

  • The algorithm is a continuous version of the breadth-first search algorithm for traversing a graph.

Implementation and Running Time

Use a priority queue to explicitly maintain the values of the minima d′(v)=min(d(u)+le)d'(v) = min(d(u) + l_e) for each node v∈Vāˆ’Sv \in V - S. In each iteration, ExtractMin returns the node that should be added to the set SS. For each node ww in the queue, if e′=(v,w)∈Ee' = (v, w) \in E, update the key of ww to min(d′(w),d(v)+le)min(d'(w), d(v) + l_e) with ChangeKey operation. Therefore, the algorithm could be implemented on a graph with nn nodes and mm edges to run in O(mlogn)O(m log n) time.

Minimum Spanning Tree

For a connected graph G=(V,E)G = (V, E), the Minimum Spanning Tree is to find a subset of the edges TāŠ†ET \subseteq E so that the graph (V,T)(V, T) is connected, and the total cost āˆ‘e∈Tce\sum_{e \in T} c_e is as small as possible.

Let TT be a minimum-cost solution to the problem defined above. If the cost of the edge is positive, the graph (V,T)(V, T) is a tree.

Minimum Spanning Tree Algorithm

  • Prim's Algorithm: The algorithm starts with a root node ss and greedily grows a tree from ss outward. At each step, the algorithm will add the node that can be attached as cheaply as possible to the partial tree.

  • Kruskal's Algorithm: The algorithm starts without any edges and inserts edges in order of increasing cost. If inserting an edge ee would result in a cycle, then ee is discarded.

  • Reverse-Delete Algorithm: The algorithm starts with the full graph (V,E)(V, E) and deletes edges in the order of decreasing costs. If deleting an edge ee would disconnect the graph, then ee is discarded.

Analyzing the Algorithm

Condition to Include an Edge

Assume that all edge costs are distinct. Let SS be any subset of nodes that neither empty nor equal to VV. Let edge e=(v,w)e = (v, w) be the minimum cost edge with one end in SS and the other in Vāˆ’SV - S. Then every minimum spanning tree contains the edge ee.

Condition to Exclude an Edge

Assume that all edge costs are distinct. Let CC be any cycle in GG, and e=(v,w)e = (v, w) be the most expensive edge in CC. Then ee doesn't belong to any minimum spanning tree.

Optimality

  • Optimality of Kruskal's Algorithm: Let edge e=(v,w)e = (v, w) added by Kruskal's, and let SS be the set of all nodes to which vv has a path before ee is added. Since adding ee doesn't create a cycle, no edge from SS to Vāˆ’SV-S has been added. Thus ee is the cheapest edge from SS to Vāˆ’SV-S, so it belongs to the minimum spanning tree.

  • Optimality of Prim's Algorithm: For each iteration of the algorithm, let SS be the partial spanning tree and ee be the edge that will be added. By definition, ee is the cheapest edge from SS to Vāˆ’SV - S, so it belongs to the minimum spanning tree.

  • Optimality of Reverse-Delete Algorithm: Let e=(v,w)e = (v, w) be an edge in a cycle CC removed by Reverse-Delete. By definition, it's the most expensive edge on CC. Therefore, ee doesn't belong to any minimum spanning tree.

Implementing Prim's Algorithm

A priority queue could be used to maintain the attachment costs for each node in the graph. For each iteration, the algorithm selects a node with an ExtractMin operation and updates the attachment costs of its neighbors with ChangeKey operations. For a graph with n nodes and m edges, the algorithm costs O(mlogn)O(m log n) time.

The Union-Find Data Structure

  • MakeUnionFind(S): Return a Union-Find data structure on set SS where all elements are in seperate sets.

  • Find(u): Return the name of the st containing uu. The time complexity is O(logn)O(logn).

  • Union(A, B): Merge the sets AA and BB into a single set. The time complexity is O(logn)O(logn).

Simple Implementation

Maintain an array Component that contains the name of the set currently containing each element. To optimize the time complexity, it's useful to explicitly maintain the list of elements for each set and record their sizes. When merging two sets, always merge the smaller set into the larger set.

For this implementation, Find takes O(1)O(1) time, MakeUnionFind(S) takes O(n)O(n) time, and any sequence of kk Union operations takes at most O(klogk)O(klogk) time.

Better Implementation

Each node v∈Sv \in S will be contained in a record with an associated pointer to the name of the set that contains vv. The initial state is that each element has a pointer that points to itself.

For two sets AA and BB, if AA is named after node vv, and BB is named after node uu, the Union(A, B) operation will update uu's pointer to point to vv. Therefore, the Find operation has to follow a sequence of pointers to get to the current set name.

For this implementation, Find takes O(logn)O(log n) time, MakeUnionFind(S) takes O(n)O(n) time, and Union takes O(1)O(1) time.

Path Compression

The path of Find operation could be compressed by resetting all pointers along the path to point to the current name of the set. It could make subsequent Find operations more efficient. The time complexity of a sequence of Find operations with path compression is extremely close to linear time. The actual upper bound is `O(n \alpha (n)) ,where, where \alpha (n) $$ is an extremely slow-growing function called the inverse Ackermann function.

Last updated