Manuel de Prada Corral
Hi!👋 I am a doctoral researcher at ETH Zürich and the Max Planck Institute for Intelligent Systems in Tübingen, advised by Ryan Cotterell (ETH) and Wieland Brendel (MPI). I am a fellow of the Max Planck–ETH Center for Learning Systems.
My research focuses on the statistical and algorithmic foundations of language models, with current emphasis on sampling theory and particle-based inference. I am interested in the interplay between reinforcement learning and sampling algorithms, and to linguistics research aided by language models. I admire mathematically grounded approaches that improve our understanding of modern NLP systems and their reliability.
Previously I was an ML intern at Hugging Face, working on the transformers generation stack, and a teaching assistant for ETH's Natural Language Processing and Large Language Models courses at ETH. Before that, I completed a double BSc in Mathematics and Computer Science at the University of Santiago de Compostela.
§ i Papers
-
A Model of Diverse Sampling from Language Models
Preprint · Under review
TL;DR. We formalize diverse language-model sampling as a global Determinantal Point Process over complete strings and use importance sampling to enable a principled quality–diversity trade-off without retraining the model.
-
An unsupervised perplexity-based method for boilerplate removal
Natural Language Engineering · 2023
TL;DR. A language-model perplexity signal separates web boilerplate from main content without supervision, beating heuristic cleaners on multilingual crawls. Released as
pyplexity. -
CiTIUS at the TREC 2022 Health Misinformation Track
TL;DR. A multi-stage retrieval pipeline that combines BM25 with transformer-based stance and credibility classifiers to surface trustworthy health information and demote misinformation.
§ ii Currently thinking about
- ▹Sampling theory. Distributions over latent reasoning traces; sampling without replacement or non-i.i.d. from LMs; useful estimators.
- ▹Particle‑based inference. Sequential Monte Carlo for structured generation; principled decoding under uncertainty.
- ▹Linguistic interpretation of LMs. Evaluating linguistic hypotheses using language models, and assessing their validity for linguistic claims.
- ▹RL & decoding. Reinforcement‑learning algorithms for generation; minimum‑Bayes‑risk decoding.
§ iii Background
- Nov 2025 – present PhD Researcher, Natural Language Processing. ETH Zürich & MPI
- Apr – Oct 2025 ML Engineer Intern. Hugging Face, Paris.
- 2024 – 2025 Teaching Assistant, NLP & LLMs. ETH Zürich.
- 2022 – 2024 MSc, Computer Science. ETH Zürich.
- 2021 – 2022 Research Assistant. CiTIUS, Universidade de Santiago de Compostela. Disinformation detection with deep learning for NLP & IR.
- 2016 – 2022 Double BSc, Mathematics & Computer Science. Universidade de Santiago de Compostela.
§ iv irl
Off the keyboard, you'll find me playing basketball, hiking, tinkering with bikes or other hardware, or with a galician bagpipe in hand. I’m a native speaker of Galician and Spanish, I use English and French professionally, and I am currently learning German.