Manuel de Prada Corral

Hi!👋 I am a doctoral researcher at ETH Zürich and the Max Planck Institute for Intelligent Systems in Tübingen, advised by Ryan Cotterell (ETH) and Wieland Brendel (MPI). I am a fellow of the Max Planck–ETH Center for Learning Systems.

My research focuses on the statistical and algorithmic foundations of language models, with current emphasis on sampling theory and particle-based inference. I am interested in the interplay between reinforcement learning and sampling algorithms, and to linguistics research aided by language models. I admire mathematically grounded approaches that improve our understanding of modern NLP systems and their reliability.

Previously I was an ML intern at Hugging Face, working on the transformers generation stack, and a teaching assistant for ETH's Natural Language Processing and Large Language Models courses at ETH. Before that, I completed a double BSc in Mathematics and Computer Science at the University of Santiago de Compostela.


§ i Papers

  1. A Model of Diverse Sampling from Language Models

    Manuel Prada‑Corral, Yahya Emara, Timothy J. O’Donnell, Ryan Cotterell, Tim Vieira

    Preprint · Under review

    TL;DR. We formalize diverse language-model sampling as a global Determinantal Point Process over complete strings and use importance sampling to enable a principled quality–diversity trade-off without retraining the model.

  2. An unsupervised perplexity-based method for boilerplate removal

    Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel, Pablo Gamallo

    Natural Language Engineering · 2023

    TL;DR. A language-model perplexity signal separates web boilerplate from main content without supervision, beating heuristic cleaners on multilingual crawls. Released as pyplexity.

  3. CiTIUS at the TREC 2022 Health Misinformation Track

    Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel

    TREC 2022 · NIST SP 500-338

    TL;DR. A multi-stage retrieval pipeline that combines BM25 with transformer-based stance and credibility classifiers to surface trustworthy health information and demote misinformation.


§ ii Currently thinking about

  • Sampling theory. Distributions over latent reasoning traces; sampling without replacement or non-i.i.d. from LMs; useful estimators.
  • Particle‑based inference. Sequential Monte Carlo for structured generation; principled decoding under uncertainty.
  • Linguistic interpretation of LMs. Evaluating linguistic hypotheses using language models, and assessing their validity for linguistic claims.
  • RL & decoding. Reinforcement‑learning algorithms for generation; minimum‑Bayes‑risk decoding.

§ iii Background

  1. Nov 2025 – present PhD Researcher, Natural Language Processing. ETH Zürich & MPI
  2. Apr – Oct 2025 ML Engineer Intern. Hugging Face, Paris.
  3. 2024 – 2025 Teaching Assistant, NLP & LLMs. ETH Zürich.
  4. 2022 – 2024 MSc, Computer Science. ETH Zürich.
  5. 2021 – 2022 Research Assistant. CiTIUS, Universidade de Santiago de Compostela. Disinformation detection with deep learning for NLP & IR.
  6. 2016 – 2022 Double BSc, Mathematics & Computer Science. Universidade de Santiago de Compostela.

§ iv irl

Off the keyboard, you'll find me playing basketball, hiking, tinkering with bikes or other hardware, or with a galician bagpipe in hand. I’m a native speaker of Galician and Spanish, I use English and French professionally, and I am currently learning German.

eu (at) manueldeprada (dot) com · Zürich · RSS · 2026