Skip to content
forked from si9int/cc.py

Extracting URLs of a specific target based on the results of "commoncrawl.org"

License

Notifications You must be signed in to change notification settings

Devendrasuthar/cc.py

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cc.py

Extracting URLs of a specific target based on the results of "commoncrawl.org".
Updated to v.0.3 | Whats new:

  • 65% faster proceeding
  • Specify a year via -y/--year, e.g.: -y 2018
  • Specify an output file via -o/--out, e.g.: -o whatever.txt
  • Crawl all pages for a specific index -i/--index, e.g.: cc.py army.mil -i CC-MAIN-2018-05
  • List all available indexes -l/--list, e.g.: cc.py army.mil -l

ToDo

  • Crawl for a specific index
  • Implementation of multithreading
  • Allowing a range of years as input
  • Implementing direct-grep
  • Temporary file-writing

Usage

cc.py [-h] [-y YEAR] [-o OUT] [-l] [-i INDEX] [-u] domain

positional arguments:
  domain                domain which will be crawled for

optional arguments:
  -h, --help            show this help message and exit
  -y YEAR, --year YEAR  limit the result to a specific year (default: all)
  -o OUT, --out OUT     specify an output file (default: domain.txt)
  -l, --list            Lists all available indexes
  -i INDEX, --index INDEX
                        Crawl for a specific index (this will crawl all
                        pages!)
  -u, --update          Update index file

Example

python3 cc.py github.com -y 2018 -o github_18.txt
cat github_18.txt | grep user

Dependencies

  • Python3

About

Extracting URLs of a specific target based on the results of "commoncrawl.org"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy