xpath na v igation
play

XPath Na v igation W E B SC R AP IN G IN P YTH ON Thomas Laetsch - PowerPoint PPT Presentation

XPath Na v igation W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU Slashes and Brackets Single for w ard slash / looks for w ard one generation Do u ble for w ard slash // looks for w ard all f u t u re generations Sq u are


  1. XPath Na v igation W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  2. Slashes and Brackets Single for w ard slash / looks for w ard one generation Do u ble for w ard slash // looks for w ard all f u t u re generations Sq u are brackets [] help narro w in on speci � c elements WEB SCRAPING IN PYTHON

  3. To Bracket or not to Bracket xpath = '/html/body' xpath = '/html[1]/body[1]' Gi v e the same selection WEB SCRAPING IN PYTHON

  4. A Bod y of P xpath = '/html/body/p' WEB SCRAPING IN PYTHON

  5. The Birds and the Ps xpath = '/html/body/div/p' xpath = '/html/body/div/p[2]' WEB SCRAPING IN PYTHON

  6. Do u ble Slashing the Brackets xpath = '//p' xpath = '//p[1]' WEB SCRAPING IN PYTHON

  7. The Wildcard xpath = '/html/body/*' The asterisks * is the "w ildcard " WEB SCRAPING IN PYTHON

  8. Xposé W E B SC R AP IN G IN P YTH ON

  9. Off the Beaten XPath W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  10. ( At ) trib u te @ represents " a � rib u te " @class @id @href WEB SCRAPING IN PYTHON

  11. Brackets and Attrib u tes WEB SCRAPING IN PYTHON

  12. Brackets and Attrib u tes xpath = '//p[@class="class-1"]' WEB SCRAPING IN PYTHON

  13. Brackets and Attrib u tes xpath = '//*[@id="uid"]' WEB SCRAPING IN PYTHON

  14. Brackets and Attrib u tes xpath = '//div[@id="uid"]/p[2]' WEB SCRAPING IN PYTHON

  15. Content w ith Contains Xpath Contains Notation : contains ( @ a � ri - name , " string - e x pr " ) WEB SCRAPING IN PYTHON

  16. Contain This xpath = '//*[contains(@class,"class-1")]' WEB SCRAPING IN PYTHON

  17. Contain This xpath = '//*[@class="class-1"]' WEB SCRAPING IN PYTHON

  18. Get Class y xpath = '/html/body/div/p[2]' WEB SCRAPING IN PYTHON

  19. Get Class y xpath = '/html/body/div/p[2]/@class' WEB SCRAPING IN PYTHON

  20. End of the Path W E B SC R AP IN G IN P YTH ON

  21. Introd u ction to the scrap y Selector W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  22. Setting u p a Selector from scrapy import Selector html = ''' <html> <body> <div class="hello datacamp"> <p>Hello World!</p> </div> <p>Enjoy DataCamp!</p> </body> </html> ''' sel = Selector( text = html ) Created a scrap y Selector object u sing a string w ith the html code The selector sel has selected the entire html doc u ment WEB SCRAPING IN PYTHON

  23. Selecting Selectors We can u se the xpath call w ithin a Selector to create ne w Selector s of speci � c pieces of the html code The ret u rn is a SelectorList of Selector objects sel.xpath("//p") # outputs the SelectorList: [<Selector xpath='//p' data='<p>Hello World!</p>'>, <Selector xpath='//p' data='<p>Enjoy DataCamp!</p>'>] WEB SCRAPING IN PYTHON

  24. E x tracting Data from a SelectorList Use the extract() method >>> sel.xpath("//p") out: [<Selector xpath='//p' data='<p>Hello World!</p>'>, <Selector xpath='//p' data='<p>Enjoy DataCamp!</p>'>] >>> sel.xpath("//p").extract() out: [ '<p>Hello World!</p>', '<p>Enjoy DataCamp!</p>' ] We can u se extract_first() to get the � rst element of the list >>> sel.xpath("//p").extract_first() out: '<p>Hello World!</p>' WEB SCRAPING IN PYTHON

  25. E x tracting Data from a Selector ps = sel.xpath('//p') second_p = ps[1] second_p.extract() out: '<p>Enjoy DataCamp!</p>' WEB SCRAPING IN PYTHON

  26. Select This Co u rse ! W E B SC R AP IN G IN P YTH ON

  27. " Inspecting the HTML " W E B SC R AP IN G IN P YTH ON Thomas Laetsch , PhD Data Scientist , NYU

  28. " So u rce " = HTML Code WEB SCRAPING IN PYTHON

  29. Inspecting Elements WEB SCRAPING IN PYTHON

  30. HTML te x t to Selector from scrapy import Selector import requests url = 'https://www.datacamp.com/courses/all' html = requests.get( url ).content sel = Selector( text = html ) WEB SCRAPING IN PYTHON

  31. Yo u Kno w O u r Secrets W E B SC R AP IN G IN P YTH ON

Recommend


More recommend