RAG PDF Chatbot with Phi-2/TinyLlama
A full-stack RAG (Retrieval-Augmented Generation) application built to process and query unstructured PDF data. The system implements a sophisticated pipeline: documents are partitioned into semantic chunks, transformed into high-dimensional embeddings, and indexed in a FAISS vector store. At query time, the system performs semantic retrieval to provide grounded context to local Small Language Models (TinyLlama or Phi-2), ensuring accurate answers while maintaining 100% data privacy.
Key Features
- End-to-End RAG Pipeline: Seamless integration of document loading, semantic chunking, and vector indexing.
- Semantic Vector Search: Utilizes FAISS and Sentence-Transformers for high-speed, context-aware information retrieval.
- Local SLM Integration: Optimized for privacy and performance using TinyLlama and Microsoft's Phi-2 models via HuggingFace.
- Explainable AI (XAI): Features a source-tracking mechanism that highlights exactly which document segments were used to generate each answer.
- Interactive UI: A polished Streamlit-based chat interface with session persistence and real-time latency monitoring.
- Optimized Text Processing: Implements Recursive Character Splitting with overlap to preserve semantic continuity across chunks.
Tech Stack
PythonLangChainFAISS (Vector Database)StreamlitHuggingFace (Transformers)Phi-2 / TinyLlama (LLMs)Sentence-Transformers (Embeddings)PyPDF
Screenshots





