Skip to content

sangaline/advanced-web-scraping-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Web Scraping Tutorial Project

This repository is a companion to the article Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more. Please refer to the article for further details.

This is a scrapy web scraper for the fictional Zipru torrent site. It is designed to bypass four distinct anti-scraping mechanisms:

  1. User agent filtering.
  2. Obfuscated javascript redirects.
  3. Captchas.
  4. Header consistency checks.

The scraper is not actually functional because Zipru is not a real site. The code, however, is otherwise complete and can easily be adapted to work on other sites.

About

The Zipru scraper developed in the Advanced Web Scraping Tutorial.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy