TY - JOUR AU - Bouma, Gosse PY - 2015/11/01 Y2 - 2024/03/29 TI - N-gram Frequencies for Dutch Twitter Data JF - Computational Linguistics in the Netherlands Journal JA - CLIN Journal VL - 5 IS - 0 SE - Articles DO - UR - https://www.clinjournal.org/clinj/article/view/55 SP - 25-36 AB - <p>This paper presents n-gram frequency data obtained from a large sample of Dutch tweets, covering a period of 4 years. After filtering of re-tweets, (near-) duplicates, and non-Dutch tweets, more than 2.6 billion tweets remained. These were tokenized, and frequencies were collected for n-grams of up to 5 words. A web interface allows users to obtain frequency information for spelling variants, grammatical phenomena (as reflected in n-gram patterns), monthly trends, and word clusters. All the underlying n-gram frequency data as well as the word clusters are available for download</p> ER -