Rasik Labs
← Back to projects

LLM Application

RAG chatbot for company knowledge bases

Upload internal documents and ask questions in plain English. Answers stream back in seconds with expandable source citations pinned to the exact page.

RAG chatbot for company knowledge bases

Live demo — RAG Knowledge Base querying Meridian Technologies documents with page-level source citations.

Problem

Internal knowledge is trapped in PDFs — policy handbooks, IT guides, financial reports, onboarding documents. Finding a specific answer means opening five files, scanning headers, and hoping the right section is where you expect it. The system lets teams upload any PDF corpus and query it conversationally. Every answer cites the exact source document and page number it was drawn from. If the retrieved context does not meet a minimum similarity threshold the system refuses to answer rather than hallucinate. This is a critical behaviour for enterprise use where a wrong answer is worse than no answer.

Architecture

System architecture diagram

MetricsOperational parameters from deployed configuration. Retrieval accuracy benchmarking with Ragas is planned for the production release.

Similarity threshold

0.3

Chunks retrieved per query

Top 5

Chunk size

800 chars

Embedding model

text-embedding-3-small

Tech stack

Python 3.11FastAPIasync REST API and streaming responsesChromaDBlocal persistent vector store with cosine HNSW indexOpenAI APIembeddings and GPT-4o chat completionsPyMuPDFPDF text extraction and page parsingNLTKsentence-aware text chunkingPydanticrequest and response schema validationNext.js 14streaming chat frontend with citation UIStreamlitadaptable to your existing reporting infrastructure

Production considerations

  • Hallucination guardThe system checks whether the best retrieved chunk meets a minimum similarity threshold before calling the LLM at all. Below threshold it returns a canned refusal rather than generating a plausible but unsupported answer.
  • Citations decoupled from LLM outputSource citations are the actual retrieved chunks from the vector store, not references extracted from the LLM response. This makes citations deterministic and auditable regardless of how the model phrases its answer.
  • Document-scoped searchQueries can be scoped to a single document via an optional doc_id filter on the vector search, letting users interrogate one file without cross-contamination from the rest of the corpus.
  • Stateless queriesEach query is independent with no conversation history sent to the LLM. This keeps costs predictable and latency low, but means follow-up questions cannot reference prior answers without client-side history injection.
  • PDF ingestion onlyThe current pipeline handles PDF documents. Extending to DOCX, HTML, and plain text is straightforward and available per client requirements.
  • Authentication and access controlAll endpoints are currently open for demo purposes. Production deployment adds per-user authentication, document-level RBAC, and audit logging for compliance.

Explore this project

Live demo and source code links will be added as they become available.

View demo — Coming soonView repository — Coming soon