Main repository for the project.
- Clone the repo 📂
- Install the project using the
make dev-install
command 🛠️ - Copy the
.env-example
file to.env
and fill in the necessary environment variables 🔑 - Load the environment variables using the
source .env
command 🔄 - You're ready to start working ☕️
├── .github/workflows <- Github actions workflows.
├── data
│ ├── processed <- The final, canonical data sets for modeling (parquet).
│ └── raw <- The original, immutable data dump.
│
├── docs <- Documentation for the project. Papers, Docs and Lernplan for Challenge X (FHNW).
├── logs <- Logs for the project.
├── notebooks <- Jupyter or Quarto Markdown Notebooks.
│ │ Naming convention is a number (for ordering) and a short `-`
│ │ delimited description, e.g. `00-example.qmd`.
│ │
│ │ OVERVIEW NOTEBOOKS:
│ │ - 01 to 08: Scraping and Data Collection
│ │ - 09 to 14: EDA and Data Preprocessing
│ │ - 15: run Pipeline
│ │ - 18: Multirocket main Model
│ │ - 20: run Embeddings
│ │
│ │ not relevant for Challenge-X (FHNW):
│ │ - 50 to 53: Models of Gabriel Torres Gamez for cgml/5Da
│ │ - 90: old scraping approach for WDB
│ │
│ └── html_notebooks <- relevant HTML versions of the notebooks.
│
├── scripts <- Scripts for the project.
├── src
│ ├── ConvTran <- Source code package of external repository of pre-trained model. (subrepo)
│ ├── CryptoFraudDetection <- Source code package for use in this project.
│ └── CryptoFraudDetection <- Package metadata.
│ .egg-info
│
├── tests <- Unit tests for the project. Includes scrapers, sentiment, embeddings, etc.
├── .env-template <- Template for environment variables.
├── .gitignore <- Files to be ignored by git.
├── compose.yml <- Docker compose file for running the image.
├── Dockerfile <- Dockerfile for the Docker image.
├── LICENSE <- MIT License.
├── Makefile <- Makefile with commands like `make install` or `make test`.
├── pyproject.toml <- Package build configuration.
└── README.md <- The top-level README for this project.
- Clone the repo with a Personal Access Token (PAT) (use a classic token!):
Replace
git clone https://USER:TOKEN@github.com/CryptoFraudDetection/main.git
USER
with your GitHub username andTOKEN
with your PAT.cd main
- Create a virtual environment:
python3 -m venv venv source venv/bin/activate
- Install the project:
pip install -e .
- Login to wandb:
Follow the instructions on the terminal.
wandb login
- Initialize the sweep on your laptop or on Slurm:
- Laptop:
python scripts/dummy.py
- Slurm:
sbatch scripts/dummy.sh
- Laptop:
- Add agents to the sweep (if needed):
- Get the sweep ID from the log file from the previous step:
- Slurm:
Replace
cat logs/dummy*NNNN*.log
NNNN
with the batch number. - Laptop: The sweep ID is printed on the terminal.
- Slurm:
- Add agents to the sweep:
Replace
sbatch scripts/dummy_sweep_agent.sh nod0ndel/dummy-model-sweep/_________
_________
with the sweep ID.
- Get the sweep ID from the log file from the previous step:
- List your jobs:
squeue -u $USER
- Check the logs:
tail -f -n 100 logs/dummy*NNNN*.log
-
Run a Jupyter server in the current directory:
cd ~/code/github.com/CryptoFraudDetection/main /cluster/common/jupyter/start-jupyter.sh -g 1 -c 12 -m 16384 -t 1-00:00:00 -d .
-g
: Number of GPUs.-c
: Number of cores.-m
: Memory in MB.-t
: Runtime (e.g.,1-00:00:00
= 1 day).
-
Connect to the slave server using the command provided in the terminal output, e.g.,
ssh -N -L 8888:localhost:8888 user@0.0.0.0
-N
: No shell login.-L
: Traffic redirection.0.0.0.0
: The IP address of the slave server (provided in the terminal output).
-
Open the link provided in the terminal (e.g.,
http://localhost:8888/lab?token=5ebfa321c439644dfa97c44fb96fc9e0296fec315ccc0f6f
) in your browser. -
Install libraries in JupyterLab:
!pip install -r requirements.txt
Or install specific libraries:
!pip install torch
srun
: Interactive job execution.sbatch
: Batch job execution.scancel
: Cancel a job.
- Start a
screen
session to prevent job interruption during network issues:Run commands inside the session.screen
- Detach the session:
Ctrl + A
, thenCtrl + D
. - Reconnect to the session:
screen -rx
See the scripts/dummy.sh
file for an example of a batch script.
- Transfer files to the Slurm server:
scp -r . slurm:/path/to/remote/directory
- Transfer files from the Slurm server to local:
scp -r slurm:/path/to/remote/file ./local/directory