Automatic Speech Recognition

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

This paper presents a novel pipeline for constructing multimodal and multilingual parallel corpora, with a focus on evaluating state-of-the-art automatic speech recognition tools for verbatim transcription. Our findings indicate that current technologies can streamline corpus construction, with fine-tuning showing promising results in terms of transcription quality compared to out-of-the-box Whisper models.

Dec 6, 2024

🏆 Paper accepted at CLiC-it 2024, the Tenth Italian Conference on Computational Linguistics

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

Sep 23, 2024

✅ Started working on EPTIC, the European Parliament Translation and Interpreting Corpus

The aim of the project is to design a pipeline to expand the existing data and experiment with speech recognition models

Oct 1, 2023