Comparing Neural Meaning-to-Text Approaches for Dutch
The neural turn in computational linguistics has made it relatively easy to build systems for natural language generation, as long as suitable annotated corpora are available. But can such systems deliver the goods? Using Dutch data of the Parallel Meaning Bank, a corpus of (mostly short) texts annotated with language-neutral meaning representations, we investigate what challenges arise and what choices can be made when implementing sequence-to-sequence or graphto- sequence transformer models for generating Dutch texts from formal meaning representations. We compare the performance of linearized input graphs with graphs encoded in various formats and find that stacking encoders obtain the best results for the standard metrics used in natural language generation. A key challenge is dealing with unknown tokens that occur in the input meaning representation. We introduce a new method based on WordNet similarity to deal with out-of-vocab concepts.