BeCoS Corpus: Belgian Covid-19 Sign Language Corpus. A Corpus for Training Sign Language Recognition and Translation

Authors

  • Vincent Vandeghinste Instituut voor de Nederlandse Taal / KU Leuven
  • Bob Van Dyck KU Leuven
  • Mathieu De Coster Universiteit Gent
  • Maud Goddefroy KU Leuven
  • Joni Dambre Universiteit Gent

Abstract

We are presenting the Belgian Federal COVID-19 corpus, nicknamed the BeCoS (Belgian Covid Sign language) corpus. It consists of the entire archive of official press conferences from the Belgian Federal Government concerning the COVID-19 pandemic. The speakers speak mostly in Dutch or French and occasionally in German, and nearly all speech is accompanied by a deaf signer who performs live interpreting from what is being said. We have preprocessed the corpus with speaker diarisation, applied Belgian Dutch ASR, and post-ASR language identification and punctuation prediction as well as signer diarisation, sign language identification and sign language keypoint recognition. The corpus is made publicly available.

Downloads

Published

2022-12-22

How to Cite

Vandeghinste, V., Van Dyck, B., De Coster, M., Goddefroy, M., & Dambre, J. (2022). BeCoS Corpus: Belgian Covid-19 Sign Language Corpus. A Corpus for Training Sign Language Recognition and Translation. Computational Linguistics in the Netherlands Journal, 12, 7–17. Retrieved from https://www.clinjournal.org/clinj/article/view/144

Issue

Section

Articles