Main

Main repository for the project.

Instructions

Clone the repo 📂
Install the project using the make dev-install command 🛠️
Copy the .env-example file to .env and fill in the necessary environment variables 🔑
Load the environment variables using the source .env command 🔄
You're ready to start working ☕️

Structure

├── .github/workflows         <- Github actions workflows.
├── data       
│   ├── processed             <- The final, canonical data sets for modeling (parquet).
│   └── raw                   <- The original, immutable data dump.
│       
├── docs                      <- Documentation for the project. Papers, Docs and Lernplan for Challenge X (FHNW).
├── logs                      <- Logs for the project.
├── notebooks                 <- Jupyter or Quarto Markdown Notebooks.
│   │                            Naming convention is a number (for ordering) and a short `-`
│   │                            delimited description, e.g. `00-example.qmd`.
│   │                            
│   │                            OVERVIEW NOTEBOOKS:
│   │                            - 01 to 08:    Scraping and Data Collection
│   │                            - 09 to 14:    EDA and Data Preprocessing
│   │                            - 15:          run Pipeline
│   │                            - 18:          Multirocket main Model
│   │                            - 20:          run Embeddings
│   │                            
│   │                            not relevant for Challenge-X (FHNW):
│   │                            - 50 to 53:    Models of Gabriel Torres Gamez for cgml/5Da
│   │                            - 90:          old scraping approach for WDB
│   │
│   └── html_notebooks        <- relevant HTML versions of the notebooks. 
│    
├── scripts                   <- Scripts for the project.
├── src
│   ├── ConvTran              <- Source code package of external repository of pre-trained model. (subrepo) 
│   ├── CryptoFraudDetection  <- Source code package for use in this project.
│   └── CryptoFraudDetection  <- Package metadata.
│       .egg-info
│
├── tests                     <- Unit tests for the project. Includes scrapers, sentiment, embeddings, etc.
├── .env-template             <- Template for environment variables.
├── .gitignore                <- Files to be ignored by git.
├── compose.yml               <- Docker compose file for running the image.
├── Dockerfile                <- Dockerfile for the Docker image.
├── LICENSE                   <- MIT License.
├── Makefile                  <- Makefile with commands like `make install` or `make test`.
├── pyproject.toml            <- Package build configuration.
└── README.md                 <- The top-level README for this project.

Train Models with Slurm

Dummy Model Example

Clone the repo with a Personal Access Token (PAT) (use a classic token!):
```
git clone https://USER:TOKEN@github.com/CryptoFraudDetection/main.git
```
Replace USER with your GitHub username and TOKEN with your PAT.
```
cd main
```

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the project:
```
pip install -e .
```
Login to wandb:
```
wandb login
```
Follow the instructions on the terminal.
Initialize the sweep on your laptop or on Slurm:
- Laptop:
```
python scripts/dummy.py
```
- Slurm:
```
sbatch scripts/dummy.sh
```
Add agents to the sweep (if needed):
1. Get the sweep ID from the log file from the previous step:
  - Slurm:
```
cat logs/dummy*NNNN*.log
```
    Replace NNNN with the batch number.
  - Laptop: The sweep ID is printed on the terminal.
2. Add agents to the sweep:
```
sbatch scripts/dummy_sweep_agent.sh nod0ndel/dummy-model-sweep/_________
```
  Replace _________ with the sweep ID.
List your jobs:
```
squeue -u $USER
```
Check the logs:
```
tail -f -n 100 logs/dummy*NNNN*.log
```

Using Jupyter Notebooks on the Cluster

Run a Jupyter server in the current directory:
```
cd ~/code/github.com/CryptoFraudDetection/main
/cluster/common/jupyter/start-jupyter.sh -g 1 -c 12 -m 16384 -t 1-00:00:00 -d .
```
- -g: Number of GPUs.
- -c: Number of cores.
- -m: Memory in MB.
- -t: Runtime (e.g., 1-00:00:00 = 1 day).
Connect to the slave server using the command provided in the terminal output, e.g.,
```
ssh -N -L 8888:localhost:8888 user@0.0.0.0
```
- -N: No shell login.
- -L: Traffic redirection.
- 0.0.0.0: The IP address of the slave server (provided in the terminal output).
Open the link provided in the terminal (e.g., http://localhost:8888/lab?token=5ebfa321c439644dfa97c44fb96fc9e0296fec315ccc0f6f) in your browser.
Install libraries in JupyterLab:
```
!pip install -r requirements.txt
```
Or install specific libraries:
```
!pip install torch
```

Tips and Best Practices

Slurm Job Management

srun: Interactive job execution.
sbatch: Batch job execution.
scancel: Cancel a job.

Using Screen Sessions

Start a screen session to prevent job interruption during network issues:
```
screen
```
Run commands inside the session.
Detach the session: Ctrl + A, then Ctrl + D.
Reconnect to the session:
```
screen -rx
```

Batch Script Example

See the scripts/dummy.sh file for an example of a batch script.

File Transfer to/from Slurm Server

Transfer files to the Slurm server:

scp -r . slurm:/path/to/remote/directory

Transfer files from the Slurm server to local:

scp -r slurm:/path/to/remote/file ./local/directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Main

Instructions

Structure

Train Models with Slurm

Dummy Model Example

Using Jupyter Notebooks on the Cluster

Tips and Best Practices

Slurm Job Management

Using Screen Sessions

Batch Script Example

File Transfer to/from Slurm Server

About

Releases

Packages

Contributors 4

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
.github/workflows		.github/workflows
data		data
docs		docs
logs		logs
notebooks		notebooks
scripts		scripts
src/CryptoFraudDetection		src/CryptoFraudDetection
tests		tests
.env-template		.env-template
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

License

CryptoFraudDetection/main

Folders and files

Latest commit

History

Repository files navigation

Main

Instructions

Structure

Train Models with Slurm

Dummy Model Example

Using Jupyter Notebooks on the Cluster

Tips and Best Practices

Slurm Job Management

Using Screen Sessions

Batch Script Example

File Transfer to/from Slurm Server

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages