web scraping with p y thon
play

Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas - PowerPoint PPT Presentation

Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU B u siness Sa vvy What are b u sinesses looking for ? Comparing prices Satisfaction of c u stomers Generating potential leads ... and m u ch more !


  1. Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  2. B u siness Sa vvy What are b u sinesses looking for ? Comparing prices Satisfaction of c u stomers Generating potential leads ... and m u ch more ! WEB SCRAPING IN PYTHON

  3. It ' s Personal What co u ld y o u do ? Search for y o u r fa v orite memes on y o u r fa v orite sites . A u tomaticall y look thro u gh classi � ed ads for y o u r fa v orite gadgets . Scrape social site content looking for hot topics . Scrape cooking blogs looking for partic u lar recipes , or recipe re v ie w s . ... and m u ch more ! WEB SCRAPING IN PYTHON

  4. Abo u t M y Work WEB SCRAPING IN PYTHON

  5. Pipe Dream WEB SCRAPING IN PYTHON

  6. Pipe Dream : Set u p Set u p Understand w hat w e w ant to do . Find so u rces to help u s do it . WEB SCRAPING IN PYTHON

  7. Pipe Dream : Acq u isition Acq u isition Read in the ra w data from online . Format these data to be u sable . WEB SCRAPING IN PYTHON

  8. Pipe Dream : Processing Processing Man y options ! WEB SCRAPING IN PYTHON

  9. Ho w do y o u do ? O u r Foc u s Acq u isition ! ( Using scrapy v ia python ) WEB SCRAPING IN PYTHON

  10. Are y o u in ? W E B SC R AP IN G IN P YTH ON

  11. H y perTe x t Mark u p Lang u age W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  12. The main e x ample WEB SCRAPING IN PYTHON

  13. HTML tags <html> ... </html> <body> ... </body> <div> ... </div> <p> ... </p> WEB SCRAPING IN PYTHON

  14. The HTML tree WEB SCRAPING IN PYTHON

  15. The HTML tree : E x ample 1 WEB SCRAPING IN PYTHON

  16. The HTML tree : E x ample 2 WEB SCRAPING IN PYTHON

  17. Introd u ction to HTML O u tro W E B SC R AP IN G IN P YTH ON

  18. HTML Tags and Attrib u tes W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  19. Do w e ha v e to ? Information w ithin HTML tags can be v al u able E x tract link URLs Easier w a y to select elements WEB SCRAPING IN PYTHON

  20. Tag , y o u' re it ! We 'v e seen tag names s u ch as html , di v , and p . The a � rib u te name is follo w ed b y = follo w ed b y information assigned to that a � rib u te , u s u all y q u oted te x t . WEB SCRAPING IN PYTHON

  21. Let ' s " di v"vy u p the tag id a � rib u te sho u ld be u niq u e class a � rib u te doesn ' t need to be u niq u e WEB SCRAPING IN PYTHON

  22. " a " be linkin ' a tags are for h y perlinks href a � rib u te tells w hat link to go to WEB SCRAPING IN PYTHON

  23. Tag Traction WEB SCRAPING IN PYTHON

  24. Et T u, Attrib u tes ? W E B SC R AP IN G IN P YTH ON

  25. Crash Co u rse X W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  26. Another Slasher Video ? xpath = '/html/body/div[2]' Simple XPath : Single for w ard - slash / u sed to mo v e for w ard one generation . tag - names bet w een slashes gi v e direction to w hich element ( s ). Brackets [] a � er a tag name tell u s w hich of the selected siblings to choose . WEB SCRAPING IN PYTHON

  27. Another Slasher Video ? xpath = '/html/body/div[2]' WEB SCRAPING IN PYTHON

  28. Slasher Do u ble Feat u re ? Direct to all table elements w ithin the entire HTML code : xpath = '//table' Direct to all table elements w hich are descendants of the 2 nd div child of the body element : xpath = '/html/body/div[2]//table` WEB SCRAPING IN PYTHON

  29. E x( path ) celent W E B SC R AP IN G IN P YTH ON

Recommend


More recommend