Dynamic Programming
Last updated
Last updated
The basic idea of dynamic programming is the opposite of the greedy strategy: one implicitly explores the space of all possible solutions by carefully decomposing things into a series of subproblems, and then building up correct solutions to larger and larger subproblems.
The Interval Scheduling Problem is the special case in which all weights are equal to 1. Therefore, most greedy algorithms will not solve this problem optimally.
There are requests labeled , with each request has a start time , a finish time , and a value . The goal is to select a subset of mutually compatible intervals to maximize the sum of the values of the selected intervals.
Suppose that the requests are sorted in order of the finishing time. is defined as the largest index that intervals and are disjoint.
Let be the optimal solution, either interval belongs to , or it doesn't. If belongs to , no interval between and belong to .
Let denotes the optimal solution of smaller problems of the form , and denotes the total weight of the solution. Since could either belongs to or not, .
However, the recursion would take exponential time to run in the worst case, since the recursion tree widens quickly due to the recursive branching.
In fact, the recursive algorithm only solves different subproblems. To eliminate the redundancy, the result of each subproblem could be cached, which is called memoization.
To develop an algorithm based on dynamic programming, the collection of subproblems derived from the original problem has to satisfies a few basic properties.
The number is subproblems is polynomial.
The solution can be easily computed from the solutions to the subproblems.
There's a natural ordering on subproblems from "smallest" to "largest" with a neasy-to-compute recurrence that could determine the solution of a subproblem from smaller subproblems.
The definition of similarity is based on finding the optimal alignment between the two strings. The process of minimizing the costs is referred to as sequence alignment.
Therefore, in an optimal alignment, at least one of the following is true:
The progress measure is the number of keys in cache
. Initially the number is 1, and it will grow to , since there are total n + 1
unique subproblems. The running time of mem_compute_opt
is if the intervals are sorted by their finishing time.
Since the recursive calls only on strictly smaller values, it makes a total of recursive calls. Therefore, the function returns an optimal solution of the problem in .
The running time of the iterative algorithm is , since each iteration takes constant time.
Double-stranded DNA is zipped together by complementary base-pairing. Each base is drawn from the set , in which and forms the pairings.
Single-stranded RNA loops back and form base pairs with itself, which is the secondary structure of RNA. A single-stranded RNA molecule can be viewed as a sequence of symbols drawn from the set .
The secondary structure on is a set of pairs , where . The secondary structure is formed with the optimum total free energy, or maximum number of base pairs.
is a matching.
If , then .
The elements of any pair consist of either or .
If and are two pairs in , then we can't have .
Let be the maximum number of base pairs in a structure on . for .
is not involved in a pair:
pairs with for some : the optimal solution between and .
Since there's no way to represent the optimal solution between and , a seperate variable is introduced.
Let denote the maximum number of base pairs in a structure on . for .
is not involved in a pair:
pairs with for some : .
Therefore, for each such that and are allowable base pair.
There are subproblems to solve, and evaluating the recurrence takes time . Therefore, the total running time is .
Let and be two strings, where consists of the sequence of symbols and consists of the sequence of symbols . A matching of these two sets is an alignment if there are no crossing pairs. If and , then .
There's a parameter $ \epsilon > 0 $ that defines a gap penalty. For each position that is not matched (a gap), it incurs a cost of .
For each pair of letters , there's a mismatch cost of for lining up with . For each , it incurs the cost x_i y_j a_pp = 0 p $$.
The cost of or the similarity between and is the sum of its gap and mismatch costs.
In the optimal alignment , either or . ( are the last symbols in the two strings.)
Let be any alignment of two strings. If , then either the position of or the position of is not matched in .
The position of is not matched
The position of is not matched
Let denote the minimum cost of an alignment between two strings.
Case 1: Pay , and
Case 2: Pay ,
Case 3: Pay ,
Therefore, the minimum alignment costs satisfy the recurrence: .
The running time is , since there are subproblems, and it takes constant time to solve each. The algorithm takes space, which is the size of the array.
The dynamic programming algorithm to the sequence alignment problem takes space, which is not suitable for computation of large strings. With a nice application of divide-and-conquer idea, the space complexity could be reduced to .
If we only care about the value of the optimal alignment, it's easy to reduce the array to , since computing the result of each subproblem only requires the result in the current and previous row. Therefore, it costs time and space.
To construct the alignment, we could use a backward formulation of the dynamic programming algorithm. Let denotes the length of the shortest path from to , and starts with .
For and , we have . The space complexity backward formulation could also be reduced to by using an array of .
The length of the shortest corner-to-corner path in that passes through is .
Let be any number in , and be an index that minimizes the quantity . There's a corner-to-corner path of minimum length that passes through .
The algorithm divides along its center column and compute the value of and for each value of . Then the algorithm could determine the minimum value of . Therefore, there's a shortest corner-to-corner path passing the node . Thus, the algorithm could search again between and (i, n / 2) (i, n / 2) (m, n) $$.
The algorithm maintains an array with at most entires, which holds nodes on the shortest corner-to-corner path as they are discovered. Therefore, the space complexity is .
Let denote the maximum running time of the algorithm on strings of length and .
Suppose that and is exactly in the middle, we could write in terms of the single variable , and let . Therefore, , which implies . Therefore, the time complexity is .
Let be a directed graph, and each edge has a weight . Dijkstra's Algorithm could only find shortest paths with non-negative weights.
Negative cycles is a directed cycle such that . If the graph has no negative cycles, find a path from to with minimum total cost .
If has no negative cycles, then there's a shortest path from to that is simple, and hense has at most edges.
Let be the minimum cost of a path with at most edges. The original problem is .
If a path uses at most edges, then .
If a path uses edges, and the first edge is , then .
Therefore, the recursive formula is .
Let be a graph with edges and nodes. Let be the number of nodes which $$ v $ has an edge to , thus it takes to compute . Since , the total running time is . Since in a directed graph, the total running time is .
Rather than recording for each value , the algorithm will use and update a single value for each node , the length of the shortest path from to that the algorithm have found so far. Therefore, the algorithm still runs for , and .
To recover the path, the algorithm maintains a map , in which is the first node (after itself) on the shortest path from to .
Let be the dirrected pointer graph whose nodes are , and edges are . If the pointer graph contains a cycle, then this cycle must have negative cost. Since doesn't have negative cycles, will never have a cycle. Therefore, the shortest path from to is .