login
Home / Papers / Designing Question-Answer Based Search System in Libraries: Application of Open...

Designing Question-Answer Based Search System in Libraries: Application of Open Source Retrieval Augmented Generation (RAG) Pipeline

88 Citations2024
Jhantu Mazumder, Parthasarathi Mukhopadhyay
Journal of Information and Knowledge

The study concluded that open-source RAG-based systems offer a cost-effective solution for libraries to enhance information retrieval and transform libraries into dynamic information services.

Abstract

This study primarily aims to prepare a prototype and demonstrate that libraries can develop a low-cost conversational search system using open-source software tools and Large Language Models (LLMs) through a Retrieval-Augmented Generation (RAG) framework. LLMs often hallucinate and provide outdated and non-contextualized responses. However, this experiment shows that LLMs can deliver contextualized, relevant responses when augmented with a set of relevant documents. Augmenting LLMs with relevant documents before generating answers is known as retrieval-augmented generation. The methodology involved creating a RAG pipeline using tools like LangChain, vector databases like ChromaDB, and open-source LLMs like Llama3 (a 70-billion parameter-based model). The prototype developed includes a dataset of 250+ relevant documents on the Chandrayaan-3 mission that was collected, processed, and ingested into the pipeline. Finally, the study compared responses from standard LLMs and LLMs with RAG augmentation. Key findings revealed that standard LLMs (without RAG) produced confidently incorrect, hallucinated responses against queries related to Chandrayaan-3, while LLMs with RAG consistently provided accurate, informative, and contextualized answers when supplied with a set of relevant documents before generating the response. The study concluded that open-source RAG-based systems offer a cost-effective solution for libraries to enhance information retrieval and transform libraries into dynamic information services.