Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models' Performance


  • Jens Van Nooten Universiteit Antwerpen
  • Ilia Markov Universiteit Antwerpen
  • Walter Daelemans Universiteit Antwerpen


In this paper, we examine the importance of word category information for the age detection task – the task of identifying the age of a person based on their writing – both under in-domain and cross-domain conditions. We remove entire word classes and study its effect using both Support Vector Machines (SVM) and pre-trained contextual word embeddings (BERT). By conducting these experiments, we aim to gain insight into how both approaches handle cross-domain conditions. Our experiments show that, on the one hand, SVM mainly relies on content words in the in-domain settings, while function words are the most indicative features in the cross-domain setup. BERT, on the other hand, mainly relies on highly-frequent word classes, such as nouns and punctuation, to make predictions both under in-domain and cross-domain age detection conditions.




How to Cite

Van Nooten, J., Markov, I., & Daelemans, W. (2021). Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models’ Performance. Computational Linguistics in the Netherlands Journal, 11, 71–84. Retrieved from https://www.clinjournal.org/clinj/article/view/122




Most read articles by the same author(s)