This project implements a Retrieval-Augmented Generation (RAG) system for managing and analyzing Kubernetes GitHub issues. It uses FAISS for efficient vector similarity search and integrates with LangChain and OpenAI GPT-4 for intelligent issue analysis and response generation.
- Fetch Kubernetes issues from GitHub using GraphQL API
- Preprocess and clean issue data
- Generate embeddings using sentence transformers
- Efficient similarity search using FAISS
- RAG-powered issue analysis and response generation
- FastAPI-based REST API for easy integration
- Interactive CLI for direct vector database queries
rag-devops/
├── scripts/ # Python scripts for each phase
│ ├── fetch_github_issues.py # GitHub issue fetching
│ ├── preprocess_issues.py # Data preprocessing
│ ├── embed_issues.py # Embedding generation
│ ├── query_vector_db.py # Interactive vector similarity search
│ ├── evaluate_rag.py # RAG system evaluation
│ ├── validate_rag.py # Validation of RAG responses
│ └── langchain_rag.py # Core RAG implementation
├── embeddings/ # Directory containing FAISS index
│ ├── index.faiss # FAISS vector index (created by embed_issues.py)
│ └── index.pkl # Document metadata (created by embed_issues.py)
├── data/ # Data directory
│ ├── k8s_issues.json # Raw GitHub issues
│ └── k8s_issues_preprocessed.json # Preprocessed issues
├── .env # Environment variables (not tracked)
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
└── README.md # This file
-
Clone the repository:
git clone https://github.com/your-username/rag-devops.git cd rag-devops
-
Create and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file with your API keys:GITHUB_TOKEN=your_github_token OPENAI_API_KEY=your_openai_api_key
-
Fetch Kubernetes issues:
python scripts/fetch_github_issues.py
-
Preprocess the issues:
python scripts/preprocess_issues.py
-
Generate embeddings and create FAISS index:
python scripts/embed_issues.py
This will create two files in the
embeddings/
directory:index.faiss
: The FAISS vector indexindex.pkl
: Document metadata and mappings
-
Query the vector database:
python scripts/query_vector_db.py
This starts an interactive CLI where you can:
- Enter questions about Kubernetes issues
- Get top-k most relevant results with similarity scores
- View issue content previews
- Type 'exit' to quit
-
Evaluate the RAG system:
python scripts/evaluate_rag.py
-
Validate RAG responses:
python scripts/validate_rag.py
- The project uses Python 3.8+
- Dependencies are managed through
requirements.txt
- Code style follows PEP 8 guidelines
- Each script is modular and well-documented
- Logging is implemented for better debugging
- Error handling is implemented throughout the codebase
If you encounter issues:
-
Make sure you've run the scripts in order:
fetch_github_issues.py
preprocess_issues.py
embed_issues.py
before runningquery_vector_db.py
-
Check that the
embeddings/
directory contains:index.faiss
index.pkl
-
Verify your environment variables in
.env
-
Check the logs for detailed error messages
MIT License - see LICENSE file for details
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request