Home / Papers / LLM-Mod: Can Large Language Models Assist Content Moderation?

LLM-Mod: Can Large Language Models Assist Content Moderation?

DOI: 10.1145/3613905.3650828Semantic Scholar

50 Citations•2024•

Mahi Kolla, Siddharth Salunkhe, Eshwar Chandrasekharan

Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

This work explores the feasibility of using large language understanding capabilities (LLMs) to identify rule violations on Reddit and examines how an LLM-based moderator (LLM-Mod) reasons about 744 posts across 9 subreddits that violate different types of rules.

Abstract

Content moderation is critical for maintaining healthy online spaces. However, it remains a predominantly manual task. Moderators are often exhausted by low moderator-to-posts ratio. Researchers have been exploring computational tools to assist human moderators. The natural language understanding capabilities of large language models (LLMs) open up possibilities to use LLMs for online moderation. This work explores the feasibility of using LLMs to identify rule violations on Reddit. We examine how an LLM-based moderator (LLM-Mod) reasons about 744 posts across 9 subreddits that violate different types of rules. We find that while LLM-Mod has a good true-negative rate (92.3%), it has a bad true-positive rate (43.1%), performing poorly when flagging rule-violating posts. LLM-Mod is likely to flag keyword-matching-based rule violations, but cannot reason about posts with higher complexity. We discuss the considerations for integrating LLMs into content moderation workflows and designing platforms that support both AI-driven and human-in-the-loop moderation.