efficient representation of uncertainty in multiple
play

Efficient representation of uncertainty in multiple sequence - PowerPoint PPT Presentation

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs JOSEPH L HERMAN, DM NOVK, RUNE LYNGSO, ADRIENN SZAB, ISTVN MIKLS AND JOTUN HEIN Describing the Problem #1


  1. Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs JOSEPH L HERMAN, ÁDÁM NOVÁK, RUNE LYNGSO, ADRIENN SZABÓ, ISTVÁN MIKLÓS AND JOTUN HEIN

  2. Describing the Problem ��������� #1 ��������� ��������� Black Box ��������� #2 What do we do we do with a … million alignments?! ��������� #1000000

  3. Sampling Procedure

  4. Point Estimate

  5. Solutions You can:  Take one sample alignment with  Maximum likelihood  In this case, MAP  Maximize positional homologies  Minimize the gaps  …  Or… Try to process the samples you have, and obtain a good estimate! ���������� ���� ����� ���� ��������� � ���� ����� ��������

  6. Columns 0 2 0 4 1 6 0 2 2 4 + 7 2 Column Vectors Even Points

  7. Objective An Alignment is a path from: 2� � 0 2� � 0 2� � 0 2� � 0

  8. Example

  9. Crossovers

  10. Why do we care about ESS? � ��� ������� ⇒ � � , � � , ⋯ , � � ��� � ��� � � � � ◦ ��� �̅ � � ⇒ ���� ���������! � ������� ���������� ������� � � , � � , ⋯ , � � ��� � ��� � 1 ◦ ��� �̅ � � � ⇒ ��� ����������! ��� � ��� � � . Bayesian Alignment methods use MCMC  So each point is highly correlated to the last point  ESS is smaller than the sample size a lot ��� �� �� ��������� �� ��� ���� ��� �������� ������� ��� �����

  11. Crossovers and ESS Suppose all alignments shared the even points A and B in their paths: A B Number of Original Alignments = 4 Number of possible alignments = 4*3*3

  12. ESS

  13. Equivalence classes

  14. Approximate Inference o The independence assumption o The accuracy  The nearest point w.r.t KL divergence metric o Consequences

  15. Pair marginals Each site is independent of the very last ones(except the immediately preceding one)

  16. Pair HMMs

  17. Mean Field Approximation Approximation

  18. Mean Field vs Pair Marginals Pair Marginal  � ������ � ������ � � 1� Mean Field  � ������ � ��� ������ �� �������� �� ������ ��

  19. Mean Field vs Pair Marginals Estimating Pair Marginals: ◦ For each column  2 � possible preceding columns. Estimating Mean field: ◦ Only approximate one parameter

  20. Mean Field vs Pair Marginals Just the distribution estimation error

  21. Mean Field vs Pair Marginals

  22. More realistic case!

  23. Point estimation The quality of an alignment can be defined in terms of: They argue that we only care about the positives, since the sample size is small!

  24. Point Estimate

  25. Loss function

  26. Simplifying the space of loss functions

  27. Simplifying a specific class

  28. The dynamic programming problem Suppose the edges were weighted somehow

  29. A Sample running

  30. Some results

Recommend


More recommend