Simple RAG Implementation with Langchain

This project provides a simple Retrieval Augmented Generation (RAG) implementation using LangChain, Pinecone, and OpenAI. It includes a Streamlit chat interface that supports conversation threads.

Features

Document ingestion
Vector storage
Chat interface with Streamlit
Conversation thread support
OpenAI for embeddings and LLM responses
Contextual compression for improved retrieval

Setup

Prerequisites

Python 3.9+
OpenAI API key
Pinecone instance (cloud or local)

Environment Setup

Clone this repository
Install dependencies:
```
pip install -r requirements-lock.txt
```
Copy .env.example to .env and fill in your API keys:
```
cp .env.example .env
```
Edit the .env file with your API keys and configuration

Usage

1. Prepare Your Documents

Place your documents in the ./data/ directory. The following formats are supported:

Text files (.txt)
PDF files (.pdf)
Markdown files (.md)

Create the data directory if it doesn't exist:

mkdir -p data

2. Ingest Documents

Run the ingestion script to process documents and store them in the vector database:

python ingest.py

This script will:

Load documents from the ./data/ directory
Split them into chunks
Embed the chunks using OpenAI
Store the embedded chunks in Pinecone

3. Run the Chat Application

Start the Streamlit chat application:

streamlit run app.py

This will open a web interface where you can:

Ask questions about your documents
Create new conversation threads
Switch between different threads

How It Works

Document Ingestion

Documents are loaded from the ./data/ directory
The system splits the documents into meaningful chunks
OpenAI embeddings are used to embed these chunks
Embeddings are stored in a Vector Database for retrieval

Note: Dimensions are 3072 for OpenAI embeddings and the vector database requires 3072 dimensions.

Question Answering

User questions are embedded using OpenAI
The system retrieves relevant chunks from the Vector Database
A contextual compression filter improves retrieval relevance
The LLM (OpenAI) generates responses based on the retrieved content and conversation history

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.streamlit		.streamlit
assets		assets
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
ingest.py		ingest.py
requirements-lock.txt		requirements-lock.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple RAG Implementation with Langchain

Features

Setup

Prerequisites

Environment Setup

Usage

1. Prepare Your Documents

2. Ingest Documents

3. Run the Chat Application

How It Works

Document Ingestion

Question Answering

About

Releases

Packages

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

carllapierre/streamlit-rag-starter

Folders and files

Latest commit

History

Repository files navigation

Simple RAG Implementation with Langchain

Features

Setup

Prerequisites

Environment Setup

Usage

1. Prepare Your Documents

2. Ingest Documents

3. Run the Chat Application

How It Works

Document Ingestion

Question Answering

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages