This project provides a simple Retrieval Augmented Generation (RAG) implementation using LangChain, Pinecone, and OpenAI. It includes a Streamlit chat interface that supports conversation threads.
- Document ingestion
- Vector storage
- Chat interface with Streamlit
- Conversation thread support
- OpenAI for embeddings and LLM responses
- Contextual compression for improved retrieval
- Python 3.9+
- OpenAI API key
- Pinecone instance (cloud or local)
- Clone this repository
- Install dependencies:
pip install -r requirements-lock.txt
- Copy
.env.example
to.env
and fill in your API keys:cp .env.example .env
- Edit the
.env
file with your API keys and configuration
Place your documents in the ./data/
directory. The following formats are supported:
- Text files (
.txt
) - PDF files (
.pdf
) - Markdown files (
.md
)
Create the data directory if it doesn't exist:
mkdir -p data
Run the ingestion script to process documents and store them in the vector database:
python ingest.py
This script will:
- Load documents from the
./data/
directory - Split them into chunks
- Embed the chunks using OpenAI
- Store the embedded chunks in Pinecone
Start the Streamlit chat application:
streamlit run app.py
This will open a web interface where you can:
- Ask questions about your documents
- Create new conversation threads
- Switch between different threads
- Documents are loaded from the
./data/
directory - The system splits the documents into meaningful chunks
- OpenAI embeddings are used to embed these chunks
- Embeddings are stored in a Vector Database for retrieval
Note: Dimensions are 3072 for OpenAI embeddings and the vector database requires 3072 dimensions.
- User questions are embedded using OpenAI
- The system retrieves relevant chunks from the Vector Database
- A contextual compression filter improves retrieval relevance
- The LLM (OpenAI) generates responses based on the retrieved content and conversation history