The design, implementation, and evaluation of a complete RAG pipeline for Document Question Answering (DocQA) using FAISSbased semantic retrieval and the Llama3 model running locally through Ollama is presented.
Retrieval-Augmented Generation (RAG) has emerged as an effective framework for enhancing factual accuracy in Large Language Models (LLMs) by grounding generated responses in retrieved document context. This paper presents the design, implementation, and evaluation of a complete RAG pipeline for Document Question Answering (DocQA) using FAISSbased semantic retrieval and the Llama3 model running locally through Ollama. The system processes PDF and text documents, constructs a vector index, retrieves top-k relevant chunks using embeddings, and generates grounded answers via LangChain’s RetrievalQA chain. A benchmark consisting of ten document-derived questions was used to evaluate performance. Token-level F1 score, exact-match accuracy, and hallucination rate were computed to quantify system reliability. Experimental results show an exact-match accuracy of 30%, a hallucination rate of 20%, and F1 scores ranging from 0.13 to 1.0. The study highlights strengths in retrieval consistency and identifies challenges in generation alignment, providing an empirical baseline for future improvements in RAG-based document reasoning