A RAG framework using Pinecone as a vector database, mixedbread- ai embeddings, and Gemini-1.5-pro for generation is presented, showing improved factual accuracy, reduced hallucinations, and enhanced user trust, making the system suitable for real-world enterprise and academic applications.
Abstract—Large Language Models (LLMs) have shown re- markable progress in natural language understanding and gen- eration. However, they suffer from hallucinations, lack of domain adaptation, and outdated knowledge. Retrieval-Augmented Gen- eration (RAG) addresses these challenges by combining semantic retrieval with generative models, enabling grounded, explainable, and domain-specific responses. This paper presents a RAG framework using Pinecone as a vector database, mixedbread- ai embeddings, and Gemini-1.5-pro for generation. We evaluate multiple chunking strategies, incorporate prompt-tuning tech- niques, and address security threats such as prompt injection attacks. Results indicate improved factual accuracy, reduced hallucinations, and enhanced user trust, making the system suitable for real-world enterprise and academic applications. Index Terms—Retrieval-Augmented Generation, Large Lan- guage Models, LangChain, Pinecone, Semantic Search, Prompt Injection, Chunking