Automatic animacy classification for Dutch
We present an automatic animacy classifier for Dutch that can determine the animacy status of nouns — how alive the noun’s referent is (human, inanimate, etc.). Animacy is a semantic property that has been shown to play a role in human sentence processing, felicity and grammaticality. Although animacy is not marked explicitly in Dutch, we expect knowledge about animacy to be helpful for parsing, translation and other NLP tasks. Only a few animacy classifiers and animacyannotated corpora exist internationally. For Dutch, animacy information is only available in the Cornetto lexical-semantic database. We augment this lexical information with context information from the Dutch Lassy Large treebank, to create training data for an animacy classifier that uses a novel kind of context features.
We use the k-nearest neighbour algorithm with distributional lexical features, e.g. how frequently the noun occurs as a subject of the verb ‘to think’ in a corpus, to decide on the (predominant) animacy class. The size of the Lassy Large corpus makes this possible, and the high level of detail these word association features provide, results in accurate Dutch-language animacy classification.