Skip to content

codingforentrepreneurs/Scrape-Websites-with-Python-FastAPI-Celery-NoSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrape Websites with Python & NoSQL

Learn how to scrape websites with Python, Selenium, Requests HTML, Celery, FastAPI, & NoSQL.

Here's what each tool is used for:

  • Python 3.9 download - programming the logic.
  • AstraDB sign up - highly perfomant and scalable database service by DataStax. AstraDB is a Cassandra NoSQL Database. Cassandra is used by Netflix, Discord, Apple, and many others to handle astonding amounts of data.
  • Selenium docs - an automated web browsing experience that allows:
    • Run all web-browser actions through code
    • Loads JavaScript heavy websites
    • Can perform standard user interaction like clicks, form submits, logins, etc.
  • Requests HTML docs - we're going to use this to parse an HTML document extracted from Selenium
  • Celery docs - Celery providers worker processes that will allow us to schedule when we need to scrape websites. We'll be using redis as our task queue.
  • FastAPI docs - as a web application framework to Display and monitor web scraping results from anywhere

This series is broken up into 4 parts:

  • Scraping How to scrape and parse data from nearly any website with Selenium & Requests HTML.
  • Data models how to store and validate data with cassandra-driver, pydantic, and AstraDB.
  • Worker & Scheduling how to schedule periodic tasks (ie scraping) integrated with Redis & AstraDB
  • Presentation How to combine the above steps in as robust web application service

Setup your system.

Below is a preflight checklist to ensure you system is fully setup to work with this course. All guides and setup can be found in the setup directory of this repo.

Preflight checklist

  • [] Install Selenium & Chromedriver - setup guide
  • [] Install Redis - setup guide
  • [] Create a virtual environment & install dependencies
  • [] Setup an account with DataStax
  • [] Create your first AstraDB and get API credentials
  • [] Use cassandra-driver to verify your connection to AstraDB

About

Learn how to scrape websites with Python, Selenium, Requests HTML, Celery, FastAPI, & NoSQL with Cassandra via AstraDB.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy