Alice Fedotova

Hi, I’m Alice! πŸ‘‹πŸ»

I’m an independent researcher specializing in Natural Language Processing. Previously, I was a Research Fellow at the University of Bologna, where I worked on EPTIC, the European Parliament Translation and Interpreting Corpus.

My research interests include Text Mining, Language Resource Creation, Benchmark Design, and Multimodal Content Analysis. I’m also a proud member of AILC, the Italian Association of Computational Linguistics.

Check out my CV for more info about my background and experience. Looking forward to connecting and hearing your (anonymous) feedback πŸ™‚!

Experience

  • Research Fellow, 2023-2024
    University of Bologna
  • Research Intern, 2021-2022
    University of Bologna

Education

  • MA in Translation and Technology
    University of Bologna
  • BA in Languages and Technologies for Intercultural Communication
    University of Bologna
CLiC-it 2024 Updates
CLiC-it 2024 Updates

Highlights from the CLiC-it 2024 conference, held in Pisa, Italy.

Jun 1, 2030

πŸ—žοΈ Updates and highlights from the CLiC-it 2024 conference in Pisa, Italy
πŸ—žοΈ Updates and highlights from the CLiC-it 2024 conference in Pisa, Italy

An overview of the main themes and research directions presented by leading researchers in the field of computational linguistics.

Dec 6, 2024

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription
Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

This paper presents a novel pipeline for constructing multimodal and multilingual parallel corpora, with a focus on evaluating state-of-the-art automatic speech recognition tools for verbatim transcription. Our findings indicate that current technologies can streamline corpus construction, with fine-tuning showing promising results in terms of transcription quality compared to out-of-the-box Whisper models.

Dec 6, 2024

πŸ† Paper accepted at CLiC-it 2024, the Tenth Italian Conference on Computational Linguistics
πŸ† Paper accepted at CLiC-it 2024, the Tenth Italian Conference on Computational Linguistics

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

Sep 23, 2024

Expanding the European Parliament Translation and Interpreting Corpus: A Modular Pipeline for the Construction of Complex Corpora

The present paper introduces an expanded version of the European Parliament Translation and Interpreting Corpus (EPTIC), a multimodal parallel corpus comprising speeches delivered at the European Parliament along with their official interpretations and translations (see Bernardini et al., 2016; Bernardini et al., 2018). Constructing multimodal and parallel corpora for translation and interpreting studies (TIS) has been acknowledged as a β€œformidable task” (Bernardini et al., 2018), which – if automated, as we propose – involves a number of subtasks such as automatic speech recognition (ASR), multilingual sentence alignment, and forced alignment, each of which poses its own challenges. Yet tackling these subtasks also offers a unique way to evaluate state-of-the-art natural language processing (NLP) tools against a unique, multilingual benchmark. In this paper we discuss the development of a modular pipeline adaptable for each of these subtasks and address the broader implications of this work for the field of corpus construction.

Sep 15, 2024

πŸŽ‰ Paper accepted at JTDH, the 14th Conference on Language Technologies and Digital Humanities
πŸŽ‰ Paper accepted at JTDH, the 14th Conference on Language Technologies and Digital Humanities

Expanding the European Parliament Translation and Interpreting Corpus: A Modular Pipeline for the Construction of Complex Corpora

Jul 5, 2024

Decoding Medical Dramas: Identifying Isotopies through Multimodal Classification
Decoding Medical Dramas: Identifying Isotopies through Multimodal Classification

Classifying audiovisual content using unimodal and multimodal transformer-based models. The study compares two classification strategies: a single multiclass classifier and a one-vs-the-rest approach, examining their performance in both unimodal and multimodal settings. Results show the multiclass multimodal approach achieves the best performance, with an F1 score of 0.723, outperforming the unimodal text-based one-vs-the-rest method.

Dec 23, 2023

βœ… Started working on EPTIC, the European Parliament Translation and Interpreting Corpus
βœ… Started working on EPTIC, the European Parliament Translation and Interpreting Corpus

The aim of the project is to design a pipeline to expand the existing data and experiment with speech recognition models

Oct 1, 2023