TY - JOUR AU - Verkijk, Stella AU - Vossen, Piek PY - 2021/12/31 Y2 - 2024/03/28 TI - MedRoBERTa.nl: A Language Model for Dutch Electronic Health Records JF - Computational Linguistics in the Netherlands Journal JA - CLIN Journal VL - 11 IS - SE - Articles DO - UR - https://www.clinjournal.org/clinj/article/view/132 SP - 141-159 AB - <p>This paper presents MedRoBERTa.nl as the first Transformer-based language model for Dutch medical language. We show that using 13GB of text data from Dutch hospital notes, pre-training from scratch results in a better domain-specific language model than further pre-training RobBERT. When extending pre-training on RobBERT, we use a domain-specific vocabulary and re-train the embedding look-up layer. We show that MedRoBERTa.nl, the model that was trained from scratch, outperforms general language models for Dutch on a medical odd-one-out similarity task. MedRoBERTa.nl already reaches higher performance than general language models for Dutch on this task after only 10k pre-training steps. When fine-tuned, MedRobERTa.nl outperforms general language models for Dutch in a task classifying sentences from Dutch hospital notes that contain information about patients’ mobility levels.</p> ER -