600 325 425 declarative methods assignment 5 dynamic
play

600.325/425 Declarative Methods Assignment 5: Dynamic Programming - PDF document

600.325/425 Declarative Methods Assignment 5: Dynamic Programming Spring 2006 Prof. Jason Eisner TA: John Blatz Due date: Wednesday, May 10, 2 pm The questions in this assignment concern Dyna. Questions 34 are closely related to


  1. 600.325/425 — Declarative Methods Assignment 5: Dynamic Programming Spring 2006 Prof. Jason Eisner TA: John Blatz Due date: Wednesday, May 10, 2 pm The questions in this assignment concern Dyna. Questions 3–4 are closely related to questions you did in the Prolog assignment. Policies and general submission instructions are the same as for the previous assignment. 1. To find out how to run the Dyna compiler and debugger, see the “CS undergrad machines” section of http://dyna.org/JHU . Now work through the start of the tutorial at http://dyna.org/Tutorial : “hello world,” Dijkstra’s algorithm, and the debugger. Also read http://dyna.org/Several_perspectives_on_Dyna . There is nothing to hand in for this question. Important request: Please send feedback about Dyna’s usability to the cs325-staff email address. If the compiler or the visual debugger does something you don’t ex- pect, or gave a confusing error message, please forward it to us. We need the are happy to help quickly. Disclaimer: Dyna is both a language and an implementation. The prototype im- plementation that you are using is for an older version of the language, and has not been updated since May 2005. The same is true of the visual debugger, Dynasty. The new versions under development are cleaner, faster, and more powerful—as you saw in lecture—but are being rebuilt from the ground up and are unfortunately not ready for you yet. Email the professor if you would like to receive announcements of new versions in the future. Hint: We apologize for any rough edges you encounter in this assignment. If you encounter a cryptic error message, something poorly explained in the documentation, or especially a bug, please don’t hesitate to inform us ASAP by email.

  2. 2. First, an easy problem to warm you up. A number of presidents of the United States have been blood relatives of one another: • George Bush, George W. Bush (father and son) • John Adams, John Q. Adams (father and son) • Theodore Roosevelt, Martin van Buren (third cousins twice removed) • Theodore Roosevelt, Franklin Roosevelt (fifth cousins) It is natural to ask questions like “Who was the most recent common ancestor of Theodore and Martin, and how recent was he or she?” (a) Write a short, well-commented Dyna program, ancestor.dyna , to find the most recent common ancestor of two people. You should be able to run the result as ./ancestor presidents.par queryA.par These files (like others in this assignment) are available from either http://cs. jhu.edu/~jason/325/hw5 or /usr/local/data/cs325/hw5 . The presidents.par file contains both Roosevelt family data and Adams family data—plenty for you to experiment with. It’s traditional to call this a family tree, but actually it is a “family DAG” (directed acyclic graph). You are supposed to find the most recent common ancestor (not Adam or Eve). We define the “recency” of a common ancestor to be the total length of his or her shortest paths from the two descendants. For example, http://www.gwu. edu/~erpapers/abouteleanor/q-and-a/q6.htm shows that Franklin and his wife Eleanor had a common ancestor of recency 13: namely Nicholas, who was 6 generations above Franklin and 7 generations above Eleanor. 1 Eleanor’s last name was Roosevelt even before she married Franklin! The most recent common ancestor of Nicholas and Nicholas was Nicholas, with recency 0. (For our purposes, he was his own ancestor.) Turn in your commented code in a file called ancestor.dyna . In your README , give the results of queryA, queryB, queryC, and queryD when you compile with --driver=backtrace . Explain how to interpret these results. You could also try --driver=dynasty bestonly . It is okay (but unnecessary) to write your own driver program or alter the .par files. If you do these things, also turn in your changed files and alert us in your README . 1 Assuming that Nicholas had one wife who bore both his sons, she was another common ancestor of recency 13. 2

  3. Note: In the current version of Dyna (v0.3), you will probably need to include the following to avoid a runtime type error when you read the .par file. 2 :- structure(child(string,string)). Hint: An earlier version of queryA.par read person1("FRANKLIN DELANO ROOSEVELT 1882-1945") := 0. person2("ANNA ELEANOR ROOSEVELT 1884-1962") := 0. You may want to try solving the problem that way first. A hint: path_from_1_up_to(X) + path_from_2_up_to(X) But you will notice that your code for handling person1 and path_from_1_up_to is basically identical to the code for handling person2 and path_from_2_up_to . Duplicate code is inelegant and hard to maintain. The fix is to have a variable that ranges over 1,2. To do this, you’ll need to change path_from_1_up_to(Name) to path(1,Name) , etc. This will allow you to eliminate your duplicate code, while changing queryA.par back to the version provided: person(1,"FRANKLIN DELANO ROOSEVELT 1882-1945") := 0. person(2,"ANNA ELEANOR ROOSEVELT 1884-1962") := 0. (b) We now turn to a related problem that has similar structure. The solution is fairly similar to ancestor.dyna , but a little trickier. You may know that on average, two siblings share 1/2 of their genes. But what fraction of their genes did Franklin and Eleanor share? Modify your .dyna and/or .par files slightly to answer this and similar ques- tions. You want to consider all the paths that relate Franklin and Eleanor. Because this is part b of the problem, call the new files ancestorb.dyna and presidentsb.par . For this question (i.e., 2b), you can assume that x and y ’s common ancestry involves no inbreeding (where a child’s parents are genetically related, as in incest or cousin marriages). Then the expected fraction of genes that they share can be found as � length( p ) � 1 � F ( x, y ) = (1) 2 p ∈ P ( x,y ) 2 If you don’t include this line, the compiler concludes that the arguments of child can be arbitrary terms. That’s great, except that, alas, native types like strings do not yet count as terms. They will in the near future. (Java faced a similar situation until Java 1.5 introduced automatic upcast from int to Integer, known as “autoboxing”). This was not a problem in the tutorial because path.dyna used literal strings in the body of the program, in a way that allowed the compiler to guess the correct type declaration :- structure(child(string,string)) . So the compiled path program was able to read flights.par . 3

  4. where P ( x, y ) is the set of paths in the family DAG that run from x up to a common ancestor and back down to y . This is closely related to the previous problem, where you were looking for the shortest path in P ( x, y ) instead of summing over paths. Under this assumption, what is the expected fraction of genes shared by i. a parent and child? ii. two siblings? (i.e., same mother and same father: sometimes called “full siblings”) iii. two half -siblings? (e.g., same mother, different fathers) iv. two half-siblings whose fathers are brothers? (This might happen if a dead man’s brother marries his widow, as actually required by the Bible. It is not a case of inbreeding, but the children have a stronger genetic relationship than in the previous case.) v. an aunt and nephew? (e.g., the aunt’s full sister is the nephew’s mother) vi. two full first cousins (e.g., their mothers are full sisters)? vii. two first cousins whose mothers are full sisters and their fathers are half- brothers? (Again, this is not a case of inbreeding, just a stronger genetic relationship than the previous case.) viii. Franklin and Eleanor? Hand in ancestorb.dyna and any .par files. Explain your strategy in your README . You can answer the above questions either directly from the definition (1) above, or by using Dyna with the tangled DAG in tangle.par . If you do both, you can check that your Dyna program gets the same answer as the formula—that’s what the graders will do. The most natural solution involves using += in your Dyna program. Then your presidentsb.par should use := 0.5 instead of := 1 . 3 Note: For this += program, you will want to use --driver=goal to see the an- swer, or --driver=dynasty in order to see the complete computation. ( --driver=backtrace 3 Alternatively, you may be able to get away without changing the .par files! The trick is to “work in the log domain.” You are trying to multiply probabilities 0 . 5 · 0 . 5 · 0 . 5 · · · along each path, instead of summing edge lengths 1 + 1 + 1 · · · along each path as in the previous problem. However, if you are willing to output the negative logarithm of the probability instead of the probability itself, you can compute − log 2 (0 . 5 · 0 . 5 · 0 . 5 · · · ) as ( − log 2 (0 . 5)) + ( − log 2 (0 . 5)) + ( − log 2 (0 . 5)) . . . , which is 1 + 1 + 1 · · · ! So it is really the same computation. Ordinary multiplication is accomplished by addition if you’re using logs. The question now is how to add up the probabilities over paths. Dyna provides an operator log+= to let you do this when your probabilities are expressed in the log domain. Suppose lx and ly are the (negated) logarithms of x and y . Then the rule lz log+= lx+ly increases lz from the (negated) logarithm of z to the (negated) logarithm of z+(x*y) , being careful to avoid underflow. So it has the same effect on negated logs that z += x*y would have had on the original values. 4

Recommend


More recommend