Home / Papers / Retrieval-Augmented Generation (RAG) with LLMs: Architecture, Methodology, System Design, Limitations,...

Retrieval-Augmented Generation (RAG) with LLMs: Architecture, Methodology, System Design, Limitations, and Outcomes

DOI: 10.55041/ijsrem52249Semantic Scholar

88 Citations•2025•

Varshini Bhaskar Shetty

INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT

A RAG framework using Pinecone as a vector database, mixedbread- ai embeddings, and Gemini-1.5-pro for generation is presented, showing improved factual accuracy, reduced hallucinations, and enhanced user trust, making the system suitable for real-world enterprise and academic applications.

Abstract

Abstract—Large Language Models (LLMs) have shown re- markable progress in natural language understanding and gen- eration. However, they suffer from hallucinations, lack of domain adaptation, and outdated knowledge. Retrieval-Augmented Gen- eration (RAG) addresses these challenges by combining semantic retrieval with generative models, enabling grounded, explainable, and domain-specific responses. This paper presents a RAG framework using Pinecone as a vector database, mixedbread- ai embeddings, and Gemini-1.5-pro for generation. We evaluate multiple chunking strategies, incorporate prompt-tuning tech- niques, and address security threats such as prompt injection attacks. Results indicate improved factual accuracy, reduced hallucinations, and enhanced user trust, making the system suitable for real-world enterprise and academic applications. Index Terms—Retrieval-Augmented Generation, Large Lan- guage Models, LangChain, Pinecone, Semantic Search, Prompt Injection, Chunking