Parsing the Dutch C-CLAMP: Unlocking 150 years of written Dutch for syntactic analysis
Abstract
We present the parsed Gothenburg edition of the Dutch Corpus of Contemporary and Late Modern Periodicals (Dutch C-CLAMP) and the Dutch Verb Construction database derived from this parsed corpus. The Dutch C-CLAMP is a diachronic corpus of Dutch-language periodicals, with material from the 19th and 20th century from Belgium and The Netherlands. Both the parsed corpus and the Dutch Verb Construction database will be made available to other researchers. In the paper we discusse the creation of the parsed corpus and offer a quantitative overview of the result. In the second half of the paper we introduce the Dutch Verb Construction database, a research database to support large scale diachronic investigation of verb constructions, and discuss its extractions and present an evaluation of the database against manually annotated data. We end the paper with a small case study on verb order, exemplifying one type of research the database facilitates.