genomics sequencing tech sequencing tech next generation
play

Genomics Sequencing tech Sequencing tech: next generation What do - PowerPoint PPT Presentation

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How to analyze these reads? Mutation identification: Mapping Cancer Heart Disease Brain Disease Genome projects: Assembly Use sequencing for other


  1. Genomics

  2. Sequencing tech

  3. Sequencing tech: next generation

  4. What do we get from sequencing?

  5. How to analyze these reads?

  6. Mutation identification: Mapping Cancer Heart Disease Brain Disease

  7. Genome projects: Assembly

  8. Use sequencing for other types of data X-seq technology

  9. RNA-seq

  10. Assembly

  11. Assembly Computational Challenge: assemble individual short fragments (reads) into a single genomic sequence (“superstring”)

  12. Shortest common superstring Problem: Given a set of strings, find a shortest string that contains all of them Input: Strings s 1 , s 2 ,…., s n Output: A string s that contains all strings s 1 , s 2 , …., s n as substrings, such that the length of s is minimized

  13. Shortest common superstring

  14. Any ideas?

  15. Directed Graph

  16. Overlap Graph

  17. Example

  18. Shortest common superstring problem is hard

  19. Shortest common superstring problem is hard

  20. Is there a better or more feasible way?

  21. Matching a superstring to a set of short reads Assume we have a set S of reads with length k (k-mers) Goal: Find a string that can be exactly split in to set S.

  22. Overlap graph approach Assume we have a set S of reads with length k (k-mers) Goal: Find a string that can be exactly split in to set S.

  23. Overlap graph approach is hard Assume we have a set S of reads with length k (k-mers) Goal: Find a string that can be exactly split in to set S.

  24. There is an alternative way

  25. De Bruijn Graph

  26. De Bruijn Graph

  27. What is the goal now?

  28. Overlap graph vs De Bruijn graph CG GT TG CA AT GC Path visited every EDGE once GG

  29. MultiEdge

  30. MultiGraph

  31. Some definitions

  32. Eulerian walk/path zero or

  33. Eulerian walk/path

  34. Proof? Algorithm?

  35. Assume all nodes are balanced a. Start with an arbitrary vertex v and form an arbitrary cycle with unused edges until a dead end is reached. Since the graph is Eulerian this dead end is necessarily the starting point, i.e., vertex v .

  36. b. If cycle from (a) is not an Eulerian cycle, it must contain a vertex w , which has untraversed edges. Perform step (a) again, using vertex w as the starting point. Once again, we will end up in the starting vertex w.

  37. c. Combine the cycles from (a) and (b) into a single cycle and iterate step (b).

  38. Eulerian path • A vertex v is � semibalanced � if | in-degree( v ) - out-degree( v )| = 1 • If a graph has an Eulerian path starting from s and ending at t , then all its vertices are balanced with the possible exception of s and t • Add an edge between two semibalanced vertices: now all vertices should be balanced (assuming there was an Eulerian path to begin with). Find the Eulerian cycle, and remove the edge you had added. You now have the Eulerian path you wanted.

Recommend


More recommend