Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models' Performance

Jens Van Nooten; Ilia Markov; Walter Daelemans

Authors

Jens Van Nooten University of Antwerp
Ilia Markov University of Antwerp
Walter Daelemans University of Antwerp

Abstract

In this paper, we examine the importance of word category information for the age detection task – the task of identifying the age of a person based on their writing – both under in-domain and cross-domain conditions. We remove entire word classes and study its effect using both Support Vector Machines (SVM) and pre-trained contextual word embeddings (BERT). By conducting these experiments, we aim to gain insight into how both approaches handle cross-domain conditions. Our experiments show that, on the one hand, SVM mainly relies on content words in the in-domain settings, while function words are the most indicative features in the cross-domain setup. BERT, on the other hand, mainly relies on highly-frequent word classes, such as nouns and punctuation, to make predictions both under in-domain and cross-domain age detection conditions.

Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models' Performance

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)