File Crawler

FileCrawler officially supports Python 3.8+.

Main features

List all file contents
Index file contents at Elasticsearch
Do OCR at several file types (with tika lib)
Look for hard-coded credentials
Much more...

Parsers:

Indexers:

Elasticsearch
Stand-alone local files

Extractors:

AWS credentials
Github and gitlab credentials
URL credentials
Authorization header credentials

Alert:

Send credential found via Telegram

IntelX Parser

Motivated by several reasons I decided to move IntelX specific rules to a new tool called IntelParser available at https://github.com/helviojunior/intelparser/

Sample outputs

In additional File Crawler save some images with the found leaked credentials at ~/.filecrawler/ directory like the images bellow

Installing

Dependencies

apt install default-jre default-jdk libmagic-dev git

Installing FileCrawler

Installing from last release

pip install -U filecrawler

Installing development package

pip install -i https://test.pypi.org/simple/ FileCrawler

Running

Config file

Create a sample config file with default parameters

filecrawler --create-config -v

Edit the configuration file config.yml with your desired parameters

Note: You must adjust the Elasticsearch URL parameter before continue

Run

# Integrate with ELK
filecrawler --index-name filecrawler --path /mnt/client_files -T 30 -v --elastic

# Just save leaks locally
filecrawler --index-name filecrawler --path /mnt/client_files -T 30 -v --local -o /home/out_test

Help

$ filecrawler -h

File Crawler v0.1.3 by Helvio Junior
File Crawler index files and search hard-coded credentials.
https://github.com/helviojunior/filecrawler
    
usage: 
    filecrawler module [flags]

Available Integration Modules:
  --elastic                  Integrate to elasticsearch
  --local                    Save leaks locally

Global Flags:
  --index-name [index name]  Crawler name
  --path [folder path]       Folder path to be indexed
  --config [config file]     Configuration file. (default: ./fileindex.yml)
  --db [sqlite file]         Filename to save status of indexed files. (default: ~/.filecrawler/{index_name}/indexer.db)
  -T [tasks]                 number of connects in parallel (per host, default: 16)
  --create-config            Create config sample
  --clear-session            Clear old file status and reindex all files
  -h, --help                 show help message and exit
  -v                         Specify verbosity level (default: 0). Example: -v, -vv, -vvv

Use "filecrawler [module] --help" for more information about a command.

How-to install ELK from scratch

Installing Elasticsearch

Docker Support

Build filecrawler only:

$ docker build --no-cache -t "filecrawler:client" https://github.com/helviojunior/filecrawler.git#main

Using Filecrawler's image:

Goes to path to be indexed and run the commands bellow

$ mkdir -p $HOME/.filecrawler/
$ docker run -v "$HOME/.filecrawler/":/u01/ -v "$PWD":/u02/ --rm -it "filecrawler:client" --create-config -v
$ docker run -v "$HOME/.filecrawler/":/u01/ -v "$PWD":/u02/ --rm -it "filecrawler:client" --path /u02/ --no-db -T 30 -v --elastic --index-name filecrawler

Build filecrawler + ELK image:

$ sysctl -w vm.max_map_count=262144
$ docker build --no-cache -t "filecrawler:latest" -f Dockerfile.elk_server https://github.com/helviojunior/filecrawler.git#main

Using Filecrawler's image:

Goes to path to be indexed and run the commands bellow

$ mkdir -p $HOME/.filecrawler/
$ docker run -p 443:443 -p 80:5601 -p 9200:9200 -v "$HOME/.filecrawler/":/u01/ -v "$PWD":/u02/ --rm -it "filecrawler:latest"

#Inside of docker run
$ filecrawler --create-config -v
$ filecrawler --path /u02/ -T 30 -v --elastic --index-name filecrawler

Using Docker with remote server using ssh forwarding

$ mkdir -p $HOME/.filecrawler/
$ docker run -v "$HOME/.ssh/":/root/.ssh/ -v "$HOME/.filecrawler/":/u01/ -v "$PWD":/u02/ --rm -it --entrypoint /bin/bash "filecrawler:client"
$ ssh -o StrictHostKeyChecking=no -Nf -L 127.0.0.1:9200:127.0.0.1:9200 user@server_ip
$ filecrawler --create-config -v
$ filecrawler --path /u02/ -T 30 --no-db -v --elastic --index-name filecrawler

Credits

This project was inspired of:

Note: Some part of codes was ported from this 2 projects

To do

Check the TODO file

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
.github		.github
filecrawler		filecrawler
images		images
scripts		scripts
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.elk_server		Dockerfile.elk_server
INSTALL_ELK.md		INSTALL_ELK.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
__init__.py		__init__.py
filecrawler.py		filecrawler.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

File Crawler

Main features

Parsers:

Indexers:

Extractors:

Alert:

IntelX Parser

Sample outputs

Installing

Dependencies

Installing FileCrawler

Running

Config file

Run

Help

How-to install ELK from scratch

Docker Support

Build filecrawler only:

Build filecrawler + ELK image:

Using Docker with remote server using ssh forwarding

Credits

To do

About

Uh oh!

Releases 16

Packages

Uh oh!

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

License

helviojunior/filecrawler

Folders and files

Latest commit

History

Repository files navigation

File Crawler

Main features

Parsers:

Indexers:

Extractors:

Alert:

IntelX Parser

Sample outputs

Installing

Dependencies

Installing FileCrawler

Running

Config file

Run

Help

How-to install ELK from scratch

Docker Support

Build filecrawler only:

Build filecrawler + ELK image:

Using Docker with remote server using ssh forwarding

Credits

To do

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages