Linguistic proxies of readability: Comparing easy-to-read and regular newspaper Dutch

Authors

  • Vincent Vandeghinste
  • Bram Bulté

Abstract

The aim of this study is to identify linguistic proxies of readability in Dutch, i.e. those linguistic features that define text as being easy-to-read. To this end, we compare the Wablieft corpus (Vandeghinste et al. 2019) (Flemish easy-to-read newspaper archives) to articles that appeared in the regular Flemish newspaper De Standaard, using a wide range of lexical, syntactic and readability metrics. We test which of these metrics has the highest effect size and which combinations of metrics work best in a classification task predicting whether articles belong to Wablieft or De Standaard. The results indicate that the best linguistic proxy for readability is (not surprisingly) the average number of words per sentence. Traditional reading metrics score well, although the combination of the parameters constituting these metrics score better in logistic regression than the original metrics.

Author Biographies

  • Vincent Vandeghinste

    Instituut voor de Nederlandse Taal (Leiden, Netherlands)

  • Bram Bulté

    KU Leuven (Belgium)

Downloads

Published

2019-12-18

Issue

Section

Articles

How to Cite

Linguistic proxies of readability: Comparing easy-to-read and regular newspaper Dutch. (2019). Computational Linguistics in the Netherlands Journal, 9, 81-100. https://www.clinjournal.org/clinj/article/view/97

Most read articles by the same author(s)