Greedy Algorithms Course: CS 5130 - Advanced Data Structures and Algorithms Instructor: Dr. Badri Adhikari
Motivation For many optimization problems, dynamic programming is overkill . A greedy algorithm always makes the choice that looks best at the moment. It makes a locally optimal choice in the hope that the choice will lead to globally optimal solution. Greedy algorithms do not always yield optimal solutions, but for many problems they do. It works well for many problems, and many algorithms are based on greedy approach - minimum-spanning-tree algorithms and Dijkstra’s algorithm for shortest path.
Activity-selection problem - several competing activities require exclusive use of a common resource - we are required to schedule the activities so that we have a maximum-size mutually compatible activities Example: a lecture hall can serve only one activity at a time
s i f i s j f j Activity-selection problem Each activity has a start time s i and finish time f i , where 0 ≤ s i ≤ f i ≤ ∞. If selected, activity a i takes place during the half-open interval [s i , f i ). Activities a i and a j are compatible if the intervals [s i ,f i ) and [s j ,f j ) do not overlap, i.e. if s i ≥ f j or s j ≥ f i . Assume that the activities are sorted by their finish time: f 1 ≤ f 2 ≤ f 3 ≤ ... ≤ f n-1 ≤ f n The problem: Select a maximum-size subset of mutually compatible activities.
Example The subset {a 3 , a 9 , a 11 } consists of mutually compatible activities (but is not the maximum subset). Classwork: What is an optimal solution? (i.e. maximum-size subset)
Optimal substructure property of A-S problem Let S ij is the set of activities that start after a i finishes and finish before a j starts. We wish to find the maximum set of mutually compatible activities in S ij and let that be A ij . A ij will include some activity a k , such that we are left with two subproblems: finding mutually compatible activities in the set S ik and S kj . If A ij is the optimal solution then S ik and S kj also must be the optimal solution. If not then, then we would use the ‘other’ solution in A ij . The activity-selection problem exhibits optimal substructure.
a i a k a j Solution using dynamic programming If we denote the size of an optimal solution for the set S ij by c[i,j], then c[i,j] = c[i,k] + c[k,j] + 1 if we know that the optimal solution includes activity a k Otherwise,
Running time of the DP solution The DP solution saves all solutions between the time slots 1 & 2, 1 & 3, 1 & 4, …, 2 & 3, 2 & 4, …, 10 & 11. What is the time complexity of the DP solution?
Making a greedy choice - DP solves all the subproblems to solve the main problem. Can we find the optimal solution without solving all subproblems? - In the activity selection problem - we need to consider only one (greedy) choice. Intuitively: - we should select an activity such that we leave maximum resource available for other activities. - the activity we choose must be the one that finishes first. Therefore, always choose an activity in S that has the earliest finish time. Once we make this greedy choice of a 1 , we have only one remaining subproblem - finding activities that start after a 1 finishes.
Do we need a DP solution? We don’t need a DP solution. What does this mean? Instead, we can repeatedly choose the activity that finishes first, keep only the activities that are compatible with this activity, and repeat. Because we always choose the activity with the earliest finish time - the increasing finish time input favors This way, we can consider each activity just once overall, in monotonically increasing order of finish time.
A recursive greedy algorithm find an activity a m that is compatible with a k - such that s m >= f k - s and f are arrays with start and finish time - k is the index of current finish time - defines the subproblem S k it is to solve - n is the size of the original problem = s.length = f.length - We add a fictitious activity a 0 with f 0 = 0
Iterative greedy algorithm m represents the index our our current start time k indexes the most recent addition to A. (k represents the index of our current finish time) - Initialize with activity a 1 into A - Find the earliest activity in S k to finish - Consider each activity a m and add it to A if found compatible What will be the output? (class-work)
Time complexity The greedy version = Θ (n) Two questions: Suppose that the activities are not sorted by finish time! (a) Can our algorithm still be used? (b) What will be the running time?
Elements of greedy strategy (a) Optimal substructure property - if an optimal solution to S ij includes activity a k , then it must also contain optimal solutions to problems S ik and S kj . (b) Greedy-choice property - we can assemble a globally optimal solution by making a locally optimal (greedy) choices. When we are considering which choice to make, we make the choice that looks best in the current problem, without considering results from subproblems.
Greedy strategy vs dynamic programming 0-1 knapsack problem: A thief robbing a store finds n items. The i th item is worth v i dollars and weighs w i pounds (v i and w i are integers). The thief wants to take as valuable a load as possible, but he can carry at most W pounds in his knapsack (W is integer). Which items should he take? Cannot take fractional items or one item more than once. “Gold ingot problem” Fractional knapsack problem: The thief can take fractions of items, rather than having to make a binary (0-1) choice for each item. “Gold dust problem” Which is a more difficult problem?
Greedy Strategy does not work for 0-1 knapsack problem Example: Thief must select a subset of 3 items and W = 50 pounds. The problem Possible solutions If fractions were allowed! ● ‘ item 1 ’ has greatest value per pound . ● The optimal subset includes items 2 and 3 (any solution with item 1 is not optimal).
Tree Basics (exempt from exam) A tree consists of a finite set of elements, called nodes (vertices) , and a finite set of directed lines, called branches (edges) , that connect the nodes . It has components named after natural trees – root, branches, and leaves. But we draw with root at the top. Root – if the tree is not empty, then the first node is root; it does not have any parent. Degree of a node – the number of branches associated with the node. Indegree – number of branches directed towards the node. Outdegree – number of branches directed away from the node. Parent – a node is parent if it has successor nodes, i.e. if outdegree > 0. Child – a node with a predecessor, i.e. indegree = 1. Ancestor – an ancestor is any node in the path from the root to the node. Descendant – a descendant is any node in the path below the parent node. Leaf (external node) – a node with outdegree = 0. Siblings – two or more node with same parent are called siblings. Path – a path is a sequence of nodes in which each node is adjacent to the next one. Level – the level of a node is its distance from the root. Level of root is 0. Height – the height of a tree is the level of the leaf in the longest path from the root PLUS 1. Subtree – a subtree is any connected structure below the root. Internal node – a node that is not a root or a leaf is called internal node Ordered tree – tree with defined order of children (enables ordered traversal)
Binary Tree (exempt from exam) A binary tree is a tree in which no node can have more than two subtrees. In other words, a node can have zero, one, or two subtrees. These subtrees are designated as left subtree and right subtree . Symmetry is not a requirement. Null tree is a tree with no nodes. Properties of Binary Tree: (1) Height Given that we need to store N nodes in a binary tree, the maximum height, H max , is N. A tree with maximum height is rare. It occurs only when the entire tree is built in one direction. Given that we need to store N nodes in a binary tree, the minimum height, H min , is [log 2 N] + 1. (2) Number of Nodes Given height H of a binary tree, N min = H Given height H of a binary tree, N max = 2 H – 1 (3) Balance - Balance Factor of a tree It is the difference in height between its left and right subtrees, B = H L - H R When is a tree balanced? A binary tree is balanced if its balance factor is 0 and its subtrees are also balanced (recursive definition). However, this is a very strict definition, so a looser definition is proposed. A binary tree is balanced if the height of its subtrees differs by no more than 1 (i.e. its balance factor is either -1, 0, or 1) and its subtrees are also balanced. Such trees are also called AVL trees.
Huffman Codes Suppose we have to store a 100,000-character data file. Only 6 different characters appear; with following frequency: How to represent such a file of information (compactly)? Use binary character code - i.e. represent each character by a unique binary string. Option 1: Use fixed-length code => a = 000, b = 001, etc. With this method how many bits do we need to code the entire file?
Huffman Codes Option 2: Use variable-length code - give frequent characters short codewords and infrequent characters long code words Number of total codes required to represent the file with this technique = 45 * 1 + 13 * 3 + 12 * 3 + 16 * 3 + 9 * 4 + 5 * 4 = 224,000 bits = 25% savings (compared to 300K) When would the savings be higher/lower?
Recommend
More recommend