Comparative Evaluation of Topic Detection: Humans vs. LLMs


  • Andriy Kosar
  • Guy De Pauw
  • Walter Daelemans


This research explores topic detection and naming in news texts, conducting a comparative study involving human participants from Ukraine, Belgium, and the USA, alongside Large Language Models (LLMs). In the first experiment, 109 participants from diverse backgrounds assigned topics to three news texts each. The findings revealed significant variations in topic assignment and naming, emphasizing the need for nuanced evaluative metrics beyond simple binary matches. The second experiment engaged eight native speakers and six LLMs to determine and name topics for seven news texts. A jury of four experts anonymously assessed these topic names, evaluating them based on criteria such as relevance, completeness, clarity, and correctness. Detailed results shed light on the potential of LLMs in topic detection, stressing the importance of acknowledging and accommodating the inherent diversity and subjectivity in topic identification, while also proposing criteria for evaluating their application in both detecting and naming topics.




How to Cite

Kosar, A., De Pauw, G., & Daelemans, W. (2024). Comparative Evaluation of Topic Detection: Humans vs. LLMs. Computational Linguistics in the Netherlands Journal, 13, 91–120. Retrieved from




Most read articles by the same author(s)

1 2 > >>