TY - JOUR AU - Van Nooten, Jens AU - Markov, Ilia AU - Daelemans, Walter PY - 2021/12/31 Y2 - 2024/03/19 TI - Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models' Performance JF - Computational Linguistics in the Netherlands Journal JA - CLIN Journal VL - 11 IS - SE - Articles DO - UR - https://www.clinjournal.org/clinj/article/view/122 SP - 71-84 AB - <p>In this paper, we examine the importance of word category information for the age detection task – the task of identifying the age of a person based on their writing – both under in-domain and cross-domain conditions. We remove entire word classes and study its effect using both Support Vector Machines (SVM) and pre-trained contextual word embeddings (BERT). By conducting these experiments, we aim to gain insight into how both approaches handle cross-domain conditions. Our experiments show that, on the one hand, SVM mainly relies on content words in the in-domain settings, while function words are the most indicative features in the cross-domain setup. BERT, on the other hand, mainly relies on highly-frequent word classes, such as nouns and punctuation, to make predictions both under in-domain and cross-domain age detection conditions.</p> ER -