Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU
B u siness Sa vvy What are b u sinesses looking for ? Comparing prices Satisfaction of c u stomers Generating potential leads ... and m u ch more ! WEB SCRAPING IN PYTHON
It ' s Personal What co u ld y o u do ? Search for y o u r fa v orite memes on y o u r fa v orite sites . A u tomaticall y look thro u gh classi � ed ads for y o u r fa v orite gadgets . Scrape social site content looking for hot topics . Scrape cooking blogs looking for partic u lar recipes , or recipe re v ie w s . ... and m u ch more ! WEB SCRAPING IN PYTHON
Abo u t M y Work WEB SCRAPING IN PYTHON
Pipe Dream WEB SCRAPING IN PYTHON
Pipe Dream : Set u p Set u p Understand w hat w e w ant to do . Find so u rces to help u s do it . WEB SCRAPING IN PYTHON
Pipe Dream : Acq u isition Acq u isition Read in the ra w data from online . Format these data to be u sable . WEB SCRAPING IN PYTHON
Pipe Dream : Processing Processing Man y options ! WEB SCRAPING IN PYTHON
Ho w do y o u do ? O u r Foc u s Acq u isition ! ( Using scrapy v ia python ) WEB SCRAPING IN PYTHON
Are y o u in ? W E B SC R AP IN G IN P YTH ON
H y perTe x t Mark u p Lang u age W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU
The main e x ample WEB SCRAPING IN PYTHON
HTML tags <html> ... </html> <body> ... </body> <div> ... </div> <p> ... </p> WEB SCRAPING IN PYTHON
The HTML tree WEB SCRAPING IN PYTHON
The HTML tree : E x ample 1 WEB SCRAPING IN PYTHON
The HTML tree : E x ample 2 WEB SCRAPING IN PYTHON
Introd u ction to HTML O u tro W E B SC R AP IN G IN P YTH ON
HTML Tags and Attrib u tes W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU
Do w e ha v e to ? Information w ithin HTML tags can be v al u able E x tract link URLs Easier w a y to select elements WEB SCRAPING IN PYTHON
Tag , y o u' re it ! We 'v e seen tag names s u ch as html , di v , and p . The a � rib u te name is follo w ed b y = follo w ed b y information assigned to that a � rib u te , u s u all y q u oted te x t . WEB SCRAPING IN PYTHON
Let ' s " di v"vy u p the tag id a � rib u te sho u ld be u niq u e class a � rib u te doesn ' t need to be u niq u e WEB SCRAPING IN PYTHON
" a " be linkin ' a tags are for h y perlinks href a � rib u te tells w hat link to go to WEB SCRAPING IN PYTHON
Tag Traction WEB SCRAPING IN PYTHON
Et T u, Attrib u tes ? W E B SC R AP IN G IN P YTH ON
Crash Co u rse X W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU
Another Slasher Video ? xpath = '/html/body/div[2]' Simple XPath : Single for w ard - slash / u sed to mo v e for w ard one generation . tag - names bet w een slashes gi v e direction to w hich element ( s ). Brackets [] a � er a tag name tell u s w hich of the selected siblings to choose . WEB SCRAPING IN PYTHON
Another Slasher Video ? xpath = '/html/body/div[2]' WEB SCRAPING IN PYTHON
Slasher Do u ble Feat u re ? Direct to all table elements w ithin the entire HTML code : xpath = '//table' Direct to all table elements w hich are descendants of the 2 nd div child of the body element : xpath = '/html/body/div[2]//table` WEB SCRAPING IN PYTHON
E x( path ) celent W E B SC R AP IN G IN P YTH ON
Recommend
More recommend