Event

LuxemBERT – A Language Model for the Luxembourgish Language

  • Conférencier  Cedric Lothritz (Interdisciplinary Centre for Security, Reliability and Trust)

  • Lieu

    Fully virtual (Contact Dr. Jakub Lengiewicz to register) 30min presentation + 30min discussion

    LU

  • Thème(s)
    Ingénierie

Abstract:

Pre-trained Language Models such as BERT have become ubiquitous in NLP where they have achieved state-of-the-art performance in most NLP tasks. While these models are readily available for English and other widely spoken languages, they remain scarce for low-resource languages such as Luxembourgish.

In this talk, I will present a quick overview of a variety of architectures for language models leading up to BERT. The main topic, however, will be LuxemBERT, a BERT model for the Luxembourgish language that we created using the following approach: we augmented the pre-training dataset by considering text data from a closely related language that we partially translated using a simple and straightforward method. Our LuxemBERT model outperforms the multilingual mBERT model, which, at the time of creation, was the only option for transformer-based language models in Luxembourgish. I will conclude with a quick demo of a Luxembourgish chatbot built on LuxemBERT. 

About the speaker

Cedric Lothritz received his Master’s degree in Computer Science, from the University of Fribourg (Switzerland), in 2017. His research interests are in the fields of machine learning and natural language processing. Cedric joined the Security, Design and Validation research group, SERVAL, headed by Prof. Yves Le Traon, and he is advised by Dr Jacques Klein and Dr Tegawendé Bissyande.

 

The Machine Learning Seminar is a regular weekly seminar series aiming to harbour presentations of fundamental and methodological advances in data science and machine learning as well as to discuss application areas presented by domain specialists. The uniqueness of the seminar series lies in its attempt to extract common denominators between domain areas and to challenge existing methodologies. The focus is thus on theory and applications to a wide range of domains, including Computational Physics and Engineering, Computational Biology and Life Sciences, Computational Behavioural and Social Sciences. More information about the ML Seminar, together with video recordings from past meetings you will find here: https://legato-team.eu/seminars/