A range of recent prompt engineering techniques for GPT-based Codenames agents are compared and qualitative changes in agents’ strategies are observed suggesting that further refinement has potential for score improvement.
The word association game Codenames challenges the AI community with its requirements for multimodal language understanding, theory of mind, and epistemic reasoning. Previous attempts to develop AI agents for the game have focused on word embedding techniques, which while good with other models using the same technique, can sometimes suffer from brittle performance when paired with other models. Recently, Large Language Models (LLMs) have demonstrated enhanced capabilities, excelling in complex cognitive tasks, including symbolic and common sense reasoning. In this paper, we compare a range of recent prompt engineering techniques for GPT-based Codenames agents. While there was no significant game score improvement over the baseline agent, we did observe qualitative changes in agents’ strategies suggesting that further refinement has potential for score improvement. We also propose a revised Codenames AI competition specifically focusing on the use of LLM agents.