A novel knowledge graph-based RAG framework with a refined retrieval pipeline, robust chunking mechanism, and source traceability for enhanced diabetes-focused LLM is proposed, demonstrating effective performance in diabetes-focused LLM.
Large language models have demonstrated exceptional performance in multiple domains. However, practical deployment in the healthcare sector has distinctive challenges. These challenges include hallucination, inconsistency, explainability, reasoning, authenticity, and validity of information sources. Hallucinations in LLM often emerge due to unstructured and obsolete training data and the incompetence to upgrade the model data post-training. Retrieval-augmented generation (RAG) integration with LLM decision-making helps access real-time information from external resources. However, further improvements are needed to improve accurate response generation. A knowledge Graph is a structured data comprising nodes as entities and edges as relationships. When integrated with RAG, Knowledge Graph-based retrieval offers better contextu-ally relevant responses, traceability, and explainability of generated responses than RAG alone. This study proposes a novel knowledge graph-based RAG framework with a refined retrieval pipeline, robust chunking mechanism, and source traceability for enhanced diabetes-focused LLM. The retrieval pipeline integrates three robust retrieval strategies: keyword, graph, and vector. To ensure the authenticity of responses, a knowledge base focusing on diabetes is designed from validated sources. This verified knowledge base is preprocessed and converted to a knowledge graph to design A graph-based RAG pipeline. The empirical results demonstrate effective performance in diabetes-focused LLM, achieving a Rouge 1 score of 82.19%.