Comparative Evaluation of Topic Detection: Humans vs. LLMs

Andriy Kosar; Guy De Pauw; Walter Daelemans

Authors

Andriy Kosar
Guy De Pauw
Walter Daelemans

Abstract

This research explores topic detection and naming in news texts, conducting a comparative study involving human participants from Ukraine, Belgium, and the USA, alongside Large Language Models (LLMs). In the first experiment, 109 participants from diverse backgrounds assigned topics to three news texts each. The findings revealed significant variations in topic assignment and naming, emphasizing the need for nuanced evaluative metrics beyond simple binary matches. The second experiment engaged eight native speakers and six LLMs to determine and name topics for seven news texts. A jury of four experts anonymously assessed these topic names, evaluating them based on criteria such as relevance, completeness, clarity, and correctness. Detailed results shed light on the potential of LLMs in topic detection, stressing the importance of acknowledging and accommodating the inherent diversity and subjectivity in topic identification, while also proposing criteria for evaluating their application in both detecting and naming topics.

Comparative Evaluation of Topic Detection: Humans vs. LLMs

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)