Home / Papers / RAGGED: Towards Informed Design of Scalable and Stable RAG Systems

RAGGED: Towards Informed Design of Scalable and Stable RAG Systems

13 Citations•2024•

Jennifer Hsia, Afreen Shaikh, Zhiruo Wang

journal unavailable

RAGGED, a framework for analyzing RAG configurations across various DBQA tasks, discovers distinct LM behaviors in response to varying context quantities, context qualities, and retrievers and provides a deeper analysis of these differences.

Abstract

Retrieval-augmented generation (RAG) enhances language models by integrating external knowledge, but its effectiveness is highly dependent on system configuration. Improper retrieval settings can degrade performance, making RAG less reliable than closed-book generation. In this work, we introduce RAGGED, a framework for systematically evaluating RAG systems across diverse retriever-reader configurations, retrieval depths, and datasets. Our analysis reveals that reader robustness to noise is the key determinant of RAG stability and scalability. Some readers benefit from increased retrieval depth, while others degrade due to their sensitivity to distracting content. Through large-scale experiments on open-domain, multi-hop, and specialized-domain datasets, we show that retrievers, rerankers, and prompts influence performance but do not fundamentally alter these reader-driven trends. By providing a principled framework and new metrics to assess RAG stability and scalability, RAGGED enables systematic evaluation of retrieval-augmented generation systems, guiding future research on optimizing retrieval depth and model robustness.