context
play

Context Since we are at the end Announcements This is the last - PowerPoint PPT Presentation

Context Since we are at the end Announcements This is the last class of the semester -- no more class meetings. Announcements This is the last class of the semester -- no more class meetings. Please respond to the Doodle poll to set


  1. PageRank • Need to discover and rank pages on the web. • Was done manually for a while. • Metric: Pages which are linked to a lot are authoritative.

  2. PageRank • Need to discover and rank pages on the web. • Was done manually for a while. • Metric: Pages which are linked to a lot are authoritative.

  3. PageRank • Need to discover and rank pages on the web. • Was done manually for a while. • Metric: Pages which are linked to a lot are authoritative. • Task: Find number of links to each page.

  4. PageRank • Need to discover and rank pages on the web. • Was done manually for a while. • Metric: Pages which are linked to a lot are authoritative. • Task: Find number of links to each page. • Challenge: 30 trillion (and growing) pages today.

  5. Web Crawlers Output a.com/i

  6. Web Crawlers Output a.com/j a.com/i -> a.com/j a.com/i /j

  7. Web Crawlers Output a.com/j a.com/i -> a.com/j a.com/i -> a.com/k a.com/k a.com/i /j /k

  8. Web Crawlers Output a.com/j a.com/i -> a.com/j a.com/i -> a.com/k a.com/k a.com/i -> d.com/a d.com/a a.com/i /j /k d.com/a

  9. Web Crawlers Output a.com/j a.com/i -> a.com/j a.com/i -> a.com/k a.com/k a.com/i -> d.com/a d.com/a a.com/i /j /k d.com/a ...

  10. Web Crawlers Output a.com/j a.com/i -> a.com/j a.com/i -> a.com/k a.com/k a.com/i -> d.com/a d.com/a a.com/i /j /k d.com/a ...

  11. Web Crawlers Output a.com/j a.com/i -> a.com/j a.com/i -> a.com/k a.com/k a.com/i -> d.com/a d.com/a a.com/i /j /k d.com/a ...

  12. Scaling Web Crawling Output 1 • Why independent outputs? Output 2 Output 3 Output 4

  13. Scaling Web Crawling Output 1 • Why independent outputs? • Is starting from independent pages sufficient? Output 2 Output 3 Output 4

  14. Scaling Web Crawling Output 1 • Why independent outputs? • Is starting from independent pages sufficient? Output 2 • For correctness? Output 3 Output 4

  15. Scaling Web Crawling Output 1 • Why independent outputs? • Is starting from independent pages sufficient? Output 2 • For correctness? Output 3 • For scalability? Output 4

  16. Scaling Web Crawling Output 1 • Why independent outputs? • Is starting from independent pages sufficient? Output 2 • For correctness? Output 3 • For scalability? • How to address any issues? Output 4

  17. Computing PageRank Output 1 a->b y->a a->b a->b y->a c->b x->j c->b d->c x->j d->c Output 2 ... Output 3 ... Output 4 ...

  18. Computing PageRank Output 1 a->b y->a y->a a->b a->b y->a c->b x->j c->b d->c x->j d->c a->b c->b Output 2 d->c ... Output 3 x->j ... Output 4 ...

  19. Computing PageRank Output 1 a->b y->a y->a Count # of unique links a->b a->b y->a c->b x->j c->b d->c x->j d->c a->b c->b Output 2 Count # of unique links d->c ... Count # of unique links Output 3 x->j ... Count # of unique links Output 4 ...

  20. Computing PageRank Output 1 a->b y->a y->a Count # of unique links a->b a->b y->a c->b x->j c->b d->c x->j d->c a->b c->b Output 2 Count # of unique links d->c ... Count # of unique links Output 3 x->j ... Count # of unique links Output 4 ... Map

  21. Computing PageRank Output 1 a->b y->a y->a Count # of unique links a->b a->b y->a c->b x->j c->b d->c x->j d->c a->b c->b Output 2 Count # of unique links d->c ... Count # of unique links Output 3 x->j ... Count # of unique links Output 4 ... Map Shuffle

  22. Computing PageRank Output 1 a->b y->a y->a Count # of unique links a->b a->b y->a c->b x->j c->b d->c x->j d->c a->b c->b Output 2 Count # of unique links d->c ... Count # of unique links Output 3 x->j ... Count # of unique links Output 4 ... Map Shuffle Reduce

  23. Map Reduce as a Computational Paradigm • Generalized into a programming framework used to implement

  24. Map Reduce as a Computational Paradigm • Generalized into a programming framework used to implement • Running aggregation queries (e.g., on large amounts of data).

  25. Map Reduce as a Computational Paradigm • Generalized into a programming framework used to implement • Running aggregation queries (e.g., on large amounts of data). • Machine learning jobs of some kind.

  26. Map Reduce as a Computational Paradigm • Generalized into a programming framework used to implement • Running aggregation queries (e.g., on large amounts of data). • Machine learning jobs of some kind. • Various other things...

  27. Map Reduce Challenges • Fault tolerance: need to replicate data and remember locations. • Scheduling: minimize time and resources used. • Sharing the cluster across jobs. • Minimizing compute and network transfer time.

  28. Sensors or IoT

  29. Many Variants, Main Di ff erences • Usually consider the case of sensors producing data.

  30. Many Variants, Main Di ff erences • Usually consider the case of sensors producing data. • Want to compute on the aggregate data from sensors.

  31. Many Variants, Main Di ff erences • Usually consider the case of sensors producing data. • Want to compute on the aggregate data from sensors. • For example, to provide early warning for volcanos, storms, earthquakes.

  32. Many Variants, Main Di ff erences • Usually consider the case of sensors producing data. • Want to compute on the aggregate data from sensors. • For example, to provide early warning for volcanos, storms, earthquakes. • For example, to provide security against intruders.

Recommend


More recommend