Home / Papers / A systematic review of large language model (LLM) evaluations in...

A systematic review of large language model (LLM) evaluations in clinical medicine

133 Citations•2025•

Sina Shool, Sara Adimi, Reza Saboori Amleshi

BMC Medical Informatics and Decision Making

A systematic review of the evaluation parameters and methodologies applied to LLMs in clinical medicine highlights certain limitations and biases across the included studies, emphasizing the need for careful interpretation and robust evaluation frameworks.

Abstract

The exponential growth in LLM research underscores their transformative potential in healthcare. However, addressing challenges such as ethical risks, evaluation variability, and underrepresentation of critical specialties will be essential. Future efforts should prioritize standardized frameworks to ensure safe, effective, and equitable LLM integration in clinical practice.