Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs JOSEPH L HERMAN, ÁDÁM NOVÁK, RUNE LYNGSO, ADRIENN SZABÓ, ISTVÁN MIKLÓS AND JOTUN HEIN
Describing the Problem ��������� #1 ��������� ��������� Black Box ��������� #2 What do we do we do with a … million alignments?! ��������� #1000000
Sampling Procedure
Point Estimate
Solutions You can: Take one sample alignment with Maximum likelihood In this case, MAP Maximize positional homologies Minimize the gaps … Or… Try to process the samples you have, and obtain a good estimate! ���������� ���� ����� ���� ��������� � ���� ����� ��������
Columns 0 2 0 4 1 6 0 2 2 4 + 7 2 Column Vectors Even Points
Objective An Alignment is a path from: 2� � 0 2� � 0 2� � 0 2� � 0
Example
Crossovers
Why do we care about ESS? � ��� ������� ⇒ � � , � � , ⋯ , � � ��� � ��� � � � � ◦ ��� �̅ � � ⇒ ���� ���������! � ������� ���������� ������� � � , � � , ⋯ , � � ��� � ��� � 1 ◦ ��� �̅ � � � ⇒ ��� ����������! ��� � ��� � � . Bayesian Alignment methods use MCMC So each point is highly correlated to the last point ESS is smaller than the sample size a lot ��� �� �� ��������� �� ��� ���� ��� �������� ������� ��� �����
Crossovers and ESS Suppose all alignments shared the even points A and B in their paths: A B Number of Original Alignments = 4 Number of possible alignments = 4*3*3
ESS
Equivalence classes
Approximate Inference o The independence assumption o The accuracy The nearest point w.r.t KL divergence metric o Consequences
Pair marginals Each site is independent of the very last ones(except the immediately preceding one)
Pair HMMs
Mean Field Approximation Approximation
Mean Field vs Pair Marginals Pair Marginal � ������ � ������ � � 1� Mean Field � ������ � ��� ������ �� �������� �� ������ ��
Mean Field vs Pair Marginals Estimating Pair Marginals: ◦ For each column 2 � possible preceding columns. Estimating Mean field: ◦ Only approximate one parameter
Mean Field vs Pair Marginals Just the distribution estimation error
Mean Field vs Pair Marginals
More realistic case!
Point estimation The quality of an alignment can be defined in terms of: They argue that we only care about the positives, since the sample size is small!
Point Estimate
Loss function
Simplifying the space of loss functions
Simplifying a specific class
The dynamic programming problem Suppose the edges were weighted somehow
A Sample running
Some results
Recommend
More recommend