Copy number Aberra4ons Normal cells: Cancer cells: Extensive gene - PDF document

4/17/09 CSCI1950‐Z Computa4onal Methods for Biology Lecture 19 Ben Raphael April 13, 2009 hIp://cs.brown.edu/courses/csci1950‐z/ Copy number Aberra4ons Normal cells: Cancer cells: Extensive gene duplica4on/dele4on Red and gene probes to regions on X chromosome: >2 copies of green region hIp://www.coriell.org/images/gm10636fish.jpg hIp://www.imppc.org/ 1

4/17/09 DNA Microarrays Measuring Muta4ons in Cancer Compara4ve Genomic Hybridiza4on (CGH) 2

4/17/09 CGH Analysis (1) Divide genome into segments of equal copy number 0.5 Log 2 (R/G) 0 Genomic posi4on ‐0.5 Dele4on Amplifica4on 0.5 0 Genomic posi4on ‐0.5 CGH Analysis (2) Iden4fy aberra4ons common to mul4ple samples Chromosome 3 of 26 lung tumor samples on mid‐ Samples density cDNA array. Common dele4on located in 3p21 and common amplifica4on – in 3q. 3

4/17/09 CGH Analysis (1) Divide genome into segments of equal copy number Copy number profile Numerous methods Copy number (e.g. clustering, Hidden Markov Model, Bayesian, Genome etc.) coordinate Segmenta4on Input : X i = log 2 T i / R i , clone i = 1, …, N Output : Assignment s(y i ) ∈ {S 1 , …, S K } S i represent copy number states An Approach to CGH Segmenta4on • Circular Binary Segmenta4on (CBS), Olshen et al. 2004 • Use hypothesis test to compare means of two intervals using t‐test (whiteboard) Dele4on Amplifica4on 0.5 Genomic posi4on 0 ‐0.5 4

4/17/09 Interval Score Lipson, et al. J. Computa7onal Biology , 2006 Assume: • X i are independent, normally distributed • µ and σ denote the mean and standard devia4on of the normal genomic data. Given an interval I spanning k probes, we define its score as: Significance of Interval Score Lipson, et al. J. Computa7onal Biology , 2006 Assume: • X i ~ N( µ, σ ) 5

4/17/09 The MaxInterval Problem A vector X =( X 1 … X n ) Input: Output: An interval I ⊂ [1… n ], that maximizes S( I ) Other intervals with high scores may be found by recursively calling this func4on. Exhaus4ve algorithm: O(n 2 ) MaxInterval Algorithm I: LookAhead Assume given: • m : An upper bound for the value of a single element X i • t : A lower bound on the maximum score I = [ i, …, i + k ‐1] I ’ = [ i ,…, i+k+r ‐1] s = Σ j ∈ I Xj s + m r sum k + r length k score Solve for first r for which S( I ) may exceed t. Complexity: Expected O( n 1.5 ) (unproved) 6

4/17/09 MaxInterval Algorithm II: Geometric Family Approximation (GFA) For ε >0 define the Ω (j 1 ) following geometric Ω (j 2 ) family of intervals: Ω (j 3 ) Δ j k j Theorem: Let I * be the op4mal scoring interval. Let J be the leomost longest interval of Ω fully contained in I *. Then S( J ) ≥ S( I *)/ α , where α ∝ (1‐ ε ‐2 ) ‐1 . Complexity: O( n ) Benchmarking synthe4c vectors of varying lengths Linear regression suggests that the complexi4es of the Exhaus4ve, LookAhead and GFA algorithms are O(n 2 ), O(n 1.5 ), O(n), respec4vely. 7

4/17/09 Applications: Single Samples 1 A2BP1 FRA16B Chromosome 16 of Log 2 (ra4o) HCT116 colon carcinoma 0 cell line on high‐density oligo array (n=5,464). ‐1 0 25 50 75 Mbp ERBB2 1 Chromosome 17 of several 0 breast carcinoma cell lines on 1 mid‐density cDNA array Log 2 (ra4o) 0 (n=364). 1 0 1 0 0 25 50 75 Mbp Another Approach to CGH Segmenta4on Use Hidden Markov Model (HMM) to “parse” sequence of probes into copy number states Dele4on Amplifica4on 0.5 Genomic posi4on 0 ‐0.5 8

4/17/09 Hidden Markov Models 1 1 1 1 … 1 … 2 2 2 2 2 2 … … … … K K K K … K x 1   x 2   x 3   x K   Example: The Dishonest Casino A casino has two dice: Fair die • P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 Loaded die • P(1) = P(2) = P(3) = P(5) = 1/10 P(6) = 1/2 Casino player switches back‐&‐forth between fair and loaded die once every 20 turns Game: 1. You bet $1 2. You roll (always with a fair die) 3. Casino player rolls (maybe with fair die, maybe with loaded die) 4. Highest number wins $2 9

4/17/09 Ques4on # 1 – Evalua4on GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 Prob = 1.3 x 10 ‐35 QUESTION How likely is this sequence, given our model of how the casino works? This is the EVALUATION problem in HMMs Ques4on # 2 – Decoding GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 FAIR LOADED FAIR QUESTION What por4on of the sequence was generated with the fair die, and what por4on with the loaded die? This is the DECODING ques4on in HMMs. This is what we want to solve for CGH analysis 10

4/17/09 Ques4on # 3 – Learning GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 Prob(6) = 64% QUESTION How “loaded” is the loaded die? How “fair” is the fair die? How ooen does the casino player change from fair to loaded, and back? This is the LEARNING ques4on in HMMs Defini4on of a hidden Markov model DefiniSon: A hidden Markov model (HMM) • Alphabet Σ = { b 1 , b 2 , …, b M } Set of states Q = { 1, ..., K } • Transi4on probabili4es between any two states • 1 2 a ij = transi4on prob from state i to state j a i1 + … + a iK = 1, for all states i = 1…K • Start probabili4es a 0i K … a 01 + … + a 0K = 1 • Emission probabili4es within each state e i (b) = P( x i = b | π i = k) e i (b 1 ) + … + e i (b M ) = 1, for all states i = 1…K 11

4/17/09 The dishonest casino model 0.05 0.95 0.95 FAIR LOADED P(1|F) = 1/6  P(1|L) = 1/10  P(2|F) = 1/6  P(2|L) = 1/10  0.05 P(3|F) = 1/6  P(3|L) = 1/10  P(4|F) = 1/6  P(4|L) = 1/10  P(5|F) = 1/6  P(5|L) = 1/10  P(6|F) = 1/6  P(6|L) = 1/2  A HMM is memory‐less 1 2 At each 4me step t, the only thing that affects future states is the current state π t K … P( π t+1 = k | “whatever happened so far”) = P( π t+1 = k | π 1 , π 2 , …, π t , x 1 , x 2 , …, x t ) = P( π t+1 = k | π t ) 12

4/17/09 A parse of a sequence Given a sequence x = x 1 ……x N , A parse of x is a sequence of states π = π 1 , ……, π N … 1 1 1 1 1 … 2 2 2 2 2 2 … … … … … K K K K K x 1   x 2   x 3   x K   Likelihood of a Parse Simply, mulSply all the orange arrows ! (transi4on probs and emission probs) … 1 1 1 1 1 … 2 2 2 2 2 2 … … … … … K K K K K x 1   x 2   x 3   x K   13

4/17/09 Likelihood of a parse A compact way to write 1 1 1 1 1 … a 0 π 1 a π 1 π 2 ……a π N‐1 π N e π 1 (x 1 )……e π N (x N ) 2 2 2 2 2 2 … Number all parameters a ij and e i (b); n params Given a sequence x = x 1 ……x N … … … … Example: and a parse π = π 1 , ……, π N , a 0Fair : θ 1 ; a 0Loaded : θ 2 ; … e Loaded (6) = θ 18 K K K K … K Then, count in x and π the # of 4mes each parameter j = 1, …, n occurs To find how likely is the parse: x 1   x 2   x 3   x K   (given our HMM) F(j, x, π ) = # parameter θ j occurs in (x, π ) P(x, π ) = P(x 1 , …, x N , π 1 , ……, π N ) = (call F(.,.,.) the feature counts ) Then, P(x N , π N | π N‐1 ) P(x N‐1 , π N‐1 | π N‐2 )……P(x 2 , π 2 | π 1 ) P(x 1 , π 1 ) = P(x N | π N ) P( π N | π N‐1 ) ……P(x 2 | π 2 ) P( π 2 | π 1 ) P(x 1 | π 1 ) P( π 1 ) = P(x, π ) = Π j=1…n θ j F(j, x, π ) = a 0 π 1 a π 1 π 2 ……a π N‐1 π N e π 1 (x 1 )……e π N (x N ) = exp [ Σ j=1…n log( θ j ) × F(j, x, π ) ] Example: the dishonest casino Let the sequence of rolls be: x = 1, 2, 1, 5, 6, 2, 1, 5, 2, 4 Then, what is the likelihood of π = Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair? (say ini4al probs a 0Fair = ½, a oLoaded = ½) ½ × P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) … P(4 | Fair) = ½ × (1/6) 10 × (0.95) 9 = .00000000521158647211 ~= 0.5 × 10 ‐9 14

4/17/09 Example: the dishonest casino So, the likelihood the die is fair in this run is just 0.521 × 10 ‐9 OK, but what is the likelihood of π = Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded? ½ × P(1 | Loaded) P(Loaded, Loaded) … P(4 | Loaded) = ½ × (1/10) 9 × (1/2) 1 (0.95) 9 = .00000000015756235243 ~= 0.16 × 10 ‐9 Therefore, it somewhat more likely that all the rolls are done with the fair die, than that they are all done with the loaded die Example: the dishonest casino Let the sequence of rolls be: x = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6 Now, what is the likelihood π = F, F, …, F ? ½ × (1/6) 10 × (0.95) 9 = 0.5 × 10 ‐9 , same as before What is the likelihood π = L, L, …, L? ½ × (1/10) 4 × (1/2) 6 (0.95) 9 = .00000049238235134735 ~= 0.5 × 10 ‐7 So, it is 100 4mes more likely the die is loaded 15

4/17/09 HMM Model for CGH data A+  A+  C+  C+  G+  G+  Fridlyand et al. (2004) S1  S1  S2  S2  S3  S3  S4  S4  A model for CGH data K states  S 1  S 2  S 3  S 4  copy numbers  1  2  3  4  Homozygous   Heterozygous   Normal   Duplication   Deletion   Deletion   (copy =2)  (copy >2)  (copy =1)  (copy =0)  µ 1 ,  ,  σ 1  µ 2 , σ 2 µ 3 , σ 3 µ 4 , σ 4 1  Emissions:  Copy number Gaussians  Genome coordinate 16

Copy number Aberra4ons Normal cells: Cancer cells: Extensive gene - PDF document

4/17/09 CSCI1950Z Computa4onal Methods for Biology Lecture 19 Ben Raphael April 13, 2009 hIp://cs.brown.edu/courses/csci1950z/ Copy number Aberra4ons Normal cells: Cancer cells: Extensive gene duplica4on/dele4on Red and gene probes

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

A Variational Model for Joint Segmentation of Copy Number Data Sandro Morganella, Michele

GISTIC Somatic Copy Number Alterations SCNAs Step 1 Copy Number Segregation Circular Binary

12 Tips for giving an Effective Presentation Louise Lehane, UoL, Ireland Tip Number One Tip

Advanced Access Content System (AACS) AACS Managed Copy Overview Presented to BD GPC October

Communication in History: The Key to Understanding Get your digital copy today at

Copying Objects: Reference Copy Copies: Reference vs. Shallow vs. Deep Reference Copy c1 := c2

Copy Constructor 1 Copy Constructor 1. Initialize one object from another of the same type

CS 225 Data Structures Fe February 3 - Li Lifecycle G G Carl Evans Cop Copy Con Constru

CSCI 104 Copy Semantics Mark Redekopp David Kempe 2 Copy constructors and assignment operators

What is a prime number? What is a prime number? What is a prime number? What is a prime number?

Copy Number Variants (CNVs) January 27 th 2015 Fady M. Mikhail, MD, PhD Associate Professor

Detection of copy number alterations and loss of heterozygosity Control-FREEC tutorial Valentina

Genome-wide characterization of copy number variants in epilepsy patients Canadian Human and

AACS AACS Managed Copy Managed Copy Status Update Status Update And Implementation And

Studying copy-on-read and copy-on-write techniques on block device level to aid in large

Self Study: Yeast Genome Comparison SESSION 4 MARTIN KRZYWINSKI Genome Sciences Centre BC

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through

Cross-ISA Machine Emulation for Multicores Emilio G. Cota Columbia University Paolo Bonzini

Deduplication in VM Environments Frank Bellosa < bellosa@kit.edu > Konrad Miller <

Memory CoW in Xen Talk overview Why is CoW need? Memory CoW basics CoW mechanism:

CS473 Web Search (II) Luo Si Department of Computer Science Purdue University Modified Slides

International meeting on numerical semigroups Cortona 2014 September 9, 2014 Francesco

Blockchains Beyond Bitcoin: Notations Towards Optimal Level of Analysis of the Problem Let Us

Copy number Aberra4ons Normal cells: Cancer cells: Extensive gene - PDF document

4/17/09 CSCI1950Z Computa4onal Methods for Biology Lecture 19 Ben Raphael April 13, 2009 hIp://cs.brown.edu/courses/csci1950z/ Copy number Aberra4ons Normal cells: Cancer cells: Extensive gene duplica4on/dele4on Red and gene probes

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

A Variational Model for Joint Segmentation of Copy Number Data Sandro Morganella, Michele

GISTIC Somatic Copy Number Alterations SCNAs Step 1 Copy Number Segregation Circular Binary

12 Tips for giving an Effective Presentation Louise Lehane, UoL, Ireland Tip Number One Tip

Advanced Access Content System (AACS) AACS Managed Copy Overview Presented to BD GPC October

Communication in History: The Key to Understanding Get your digital copy today at

Copying Objects: Reference Copy Copies: Reference vs. Shallow vs. Deep Reference Copy c1 := c2

Copy Constructor 1 Copy Constructor 1. Initialize one object from another of the same type

CS 225 Data Structures Fe February 3 - Li Lifecycle G G Carl Evans Cop Copy Con Constru

CSCI 104 Copy Semantics Mark Redekopp David Kempe 2 Copy constructors and assignment operators

What is a prime number? What is a prime number? What is a prime number? What is a prime number?

Copy Number Variants (CNVs) January 27 th 2015 Fady M. Mikhail, MD, PhD Associate Professor

Detection of copy number alterations and loss of heterozygosity Control-FREEC tutorial Valentina

Genome-wide characterization of copy number variants in epilepsy patients Canadian Human and

AACS AACS Managed Copy Managed Copy Status Update Status Update And Implementation And

Studying copy-on-read and copy-on-write techniques on block device level to aid in large

Self Study: Yeast Genome Comparison SESSION 4 MARTIN KRZYWINSKI Genome Sciences Centre BC

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through

Cross-ISA Machine Emulation for Multicores Emilio G. Cota Columbia University Paolo Bonzini

Deduplication in VM Environments Frank Bellosa &lt; bellosa@kit.edu &gt; Konrad Miller &lt;

Memory CoW in Xen Talk overview Why is CoW need? Memory CoW basics CoW mechanism:

CS473 Web Search (II) Luo Si Department of Computer Science Purdue University Modified Slides

International meeting on numerical semigroups Cortona 2014 September 9, 2014 Francesco

Blockchains Beyond Bitcoin: Notations Towards Optimal Level of Analysis of the Problem Let Us

Deduplication in VM Environments Frank Bellosa < bellosa@kit.edu > Konrad Miller <