Greedy Algorithms
An algorithm is greedy if it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion.
Interval Scheduling
The Interval Scheduling Problem consists of finding the compatible set of maximum size from a set of requests, in which each request has a starting time and finishing time .
Interval Scheduling Algorithm
To design a greedy algorithm, the basic idea is to use a simple rule to select the first request . Once a request is accepted, all requests that are not compatible with are rejected. The algorithm terminates when all requests are either accepted or rejected.
(Incorrect) Accept the request that starts earliest.
(Incorrect) Accept the request that requires the smallest interval of time.
(Incorrect) Accept the request that has the fewest number of non-compatible requests.
(Optimal) Accept the request that finishes first.
Analyzing the Algorithm
The set A returned by the algorithm is a compatible set of requests. Let O be an optimal set of intervals. Let be the set of requests in A, and be the set of requests in O. If , A is optimal.
Induction: For the first request, our greedy rule guarantees that . For , the induction hypothesis assume that . Since , . Therefore, the interval is in the set of available intervals when the greedy algorithm selects , thus .
For each , the interval finishes at least as soon as the interval in O. Since the algorithm terminates when there's no possible requests, and A is optimal.
Implementation and Running Time
The sorting operation takes time. The single-pass iteration takes time. Therefore, the time complexity of the interval scheduling algorithm is .
Interval Partitioning
The Interval Partitioning Problem consists of scheduling all requests using as few resources as possible given a set of requests and many identical resources.
In any instance of interval partitioning, the number of resources needed is at least the depth of the set of intervals.
Interval Partitioning Algorithm
Let d be the depth of the set of intervals. The algorithm assigns a label to each interval, and overlapping intervals have different labels.
The algorithm iterates through the intervals and tries to assign to each interval a label that hasn't been assigned to any previous interval that overlaps it.
Analyzing the Algorithm
The greedy algorithm above will assign a label to each interval, and no two overlapping intervals will receive the same label. The number of resources in the result equals the depth of the set of intervals, which is the optimal number of resources needed.
Shortest Paths
For a directed graph with a designated start node , has a path to every other node in . Each edge has a length , indicating the time or distance to traverse . is the sum of the lengths of all edges in . The goal is to determine the shortest path from to every other node in the graph.
Dijkstra's Algorithm
The algorithm maintains a set of vertices for which we have determined a shortest-path distance from . Initially . For each node , we determine the shortest path that can be constructed by traveling along a path through to , followed by the single edge . The node that will minimize the distance is added to , and .
To construct the shortest path from to , we could start at and recursively follow the edge we stored for in the reverse direction to until we reach .
Analyzing the Algorithm
Consider the set at any point in the algorithm's execution. For each , the path is the shortest path.
Induction: The case is trivial since . Suppose the claim holds when for . By induction hypothesis, is the shortest path for each . In the iteration , the node is added to by the algorithm. If is another path from to through a node other than , it can't be shorter than . Otherwise, will be added by the algorithm instead of .
The algorithm doesn't always find the shortest paths if some of the edges can have negative lengths.
The algorithm is a continuous version of the breadth-first search algorithm for traversing a graph.
Implementation and Running Time
Use a priority queue to explicitly maintain the values of the minima for each node . In each iteration, ExtractMin
returns the node that should be added to the set . For each node in the queue, if , update the key of to with ChangeKey
operation. Therefore, the algorithm could be implemented on a graph with nodes and edges to run in time.
Minimum Spanning Tree
For a connected graph , the Minimum Spanning Tree is to find a subset of the edges so that the graph is connected, and the total cost is as small as possible.
Let be a minimum-cost solution to the problem defined above. If the cost of the edge is positive, the graph is a tree.
Minimum Spanning Tree Algorithm
Prim's Algorithm: The algorithm starts with a root node and greedily grows a tree from outward. At each step, the algorithm will add the node that can be attached as cheaply as possible to the partial tree.
Kruskal's Algorithm: The algorithm starts without any edges and inserts edges in order of increasing cost. If inserting an edge would result in a cycle, then is discarded.
Reverse-Delete Algorithm: The algorithm starts with the full graph and deletes edges in the order of decreasing costs. If deleting an edge would disconnect the graph, then is discarded.
Analyzing the Algorithm
Condition to Include an Edge
Assume that all edge costs are distinct. Let be any subset of nodes that neither empty nor equal to . Let edge be the minimum cost edge with one end in and the other in . Then every minimum spanning tree contains the edge .
Condition to Exclude an Edge
Assume that all edge costs are distinct. Let be any cycle in , and be the most expensive edge in . Then doesn't belong to any minimum spanning tree.
Optimality
Optimality of Kruskal's Algorithm: Let edge added by Kruskal's, and let be the set of all nodes to which has a path before is added. Since adding doesn't create a cycle, no edge from to has been added. Thus is the cheapest edge from to , so it belongs to the minimum spanning tree.
Optimality of Prim's Algorithm: For each iteration of the algorithm, let be the partial spanning tree and be the edge that will be added. By definition, is the cheapest edge from to , so it belongs to the minimum spanning tree.
Optimality of Reverse-Delete Algorithm: Let be an edge in a cycle removed by Reverse-Delete. By definition, it's the most expensive edge on . Therefore, doesn't belong to any minimum spanning tree.
Implementing Prim's Algorithm
A priority queue could be used to maintain the attachment costs for each node in the graph. For each iteration, the algorithm selects a node with an ExtractMin
operation and updates the attachment costs of its neighbors with ChangeKey
operations. For a graph with n nodes and m edges, the algorithm costs time.
The Union-Find Data Structure
MakeUnionFind(S)
: Return a Union-Find data structure on set where all elements are in seperate sets.Find(u)
: Return the name of the st containing . The time complexity is .Union(A, B)
: Merge the sets and into a single set. The time complexity is .
Simple Implementation
Maintain an array Component
that contains the name of the set currently containing each element. To optimize the time complexity, it's useful to explicitly maintain the list of elements for each set and record their sizes. When merging two sets, always merge the smaller set into the larger set.
For this implementation, Find
takes time, MakeUnionFind(S)
takes time, and any sequence of Union
operations takes at most time.
Better Implementation
Each node will be contained in a record with an associated pointer to the name of the set that contains . The initial state is that each element has a pointer that points to itself.
For two sets and , if is named after node , and is named after node , the Union(A, B)
operation will update 's pointer to point to . Therefore, the Find
operation has to follow a sequence of pointers to get to the current set name.
For this implementation, Find
takes time, MakeUnionFind(S)
takes time, and Union
takes time.
Path Compression
The path of Find
operation could be compressed by resetting all pointers along the path to point to the current name of the set. It could make subsequent Find
operations more efficient. The time complexity of a sequence of Find
operations with path compression is extremely close to linear time. The actual upper bound is `O(n \alpha (n)) \alpha (n) $$ is an extremely slow-growing function called the inverse Ackermann function.
Last updated