Improving Domain-specific Cross-lingual Embeddings with Automatically Generated Bilingual Dictionaries

Authors

  • Pranaydeep Singh Universiteit Gent
  • Ayla Rigouts Terryn KU Leuven
  • Els Lefever Universiteit Gent

Abstract

This paper reports on a set of proof-of-concept experiments performed to evaluate and improve the alignment of monolingual embeddings for a specialised domain, viz. the medical use case of heart failure. The presented approach, which creates domain-specific dictionaries on-the-fly from cross-lingual Wikipedia links, achieves good results for cross-lingual alignment of this specialised vocabulary in three language pairs: English-Dutch, English-French, and Dutch-French. The experimental results show that the setup incorporating a smaller but dedicated domain-specific dictionary outperforms the alignment incorporating a larger but general-domain seed dictionary. A detailed error analysis reveals that many potentially useful (near-)equivalents are found beyond those present in the gold standard, and it inspires strategies for further improvements, such as lemmatisation and improved tokenisation.

Downloads

Published

2022-12-22

How to Cite

Singh, P., Rigouts Terryn, A., & Lefever, E. (2022). Improving Domain-specific Cross-lingual Embeddings with Automatically Generated Bilingual Dictionaries. Computational Linguistics in the Netherlands Journal, 12, 125–140. Retrieved from https://www.clinjournal.org/clinj/article/view/151

Issue

Section

Articles

Most read articles by the same author(s)