A Toy Probabilistic Transformer for Debugging Generation Algorithms in HuggingFace🤗

by Manuel de Prada Corral

3 min read

A few weeks ago, I found myself implementing "Stochastic Beams and Where to Find Them" (sampling without replacement from a Transformer).

Debugging and verifying the correctness of a sampling algorithm in HuggingFace is not straightforward. Thus, I built a fake carcass for a Transformer model with a small vocabulary and fixed controlled probabilities that could allow to keep a close eye on the logits and the generated sequence.

stateDiagram-v2
    state "[0]" as 1
    state "[0,1]" as 01
    state "[0,2]" as 02
    state "[0,1,1]" as 011
    state "[0,1,1,3]" as 0113
    state "[0,1,2]" as 012
    state "[0,1,2,3]" as 0123
    state "[0,2,1]" as 021
    state "[0,2,1,3]" as 0213
    state "[0,2,2]" as 022
    state "[0,2,2,3]" as 0223
    
    note right of 0113
    prob=0.075
    logp=-2.59
    end note
    note right of 0123
        prob=0.675
        logp=-0.39
    end note
    note right of 0223
        prob=0.225
        logp=-1.49
    end note
             
    note right of 0213
        prob=0.025
        logp=-3.68
    end note


    [*] --> 1 : 0 (BOS)
    1 --> 01 : 75%
    1 --> 02 : 25%
    01 --> 011 : 10%
    01 --> 012 : 90%
    02 --> 021 : 10%
    02 --> 022 : 90%
    011 --> 0113 : EOS
    012 --> 0123 : EOS
    021 --> 0213 : EOS
    022 --> 0223 : EOS


Continue reading →

Porting Stochastic Beam Search to HuggingFace🤗

by Manuel de Prada Corral

4 min read

Stochastic beam search is a principled way of getting a sample-without-replacement from an autoregressive model, just by perturbing the scores of the beam search algorithm. This allows to construct low-variance estimators over the model's distribution, which can be useful to estimate model's properties and explore stochastic strategies for generation.

Continue reading →

Unofficial documentation for the HuggingFace🤗 generation pipeline

by Manuel de Prada Corral

6 min read

While implementing a new generation strategy for Transformer models, I found myself delving deep into the HuggingFace library. The documentation is clear with respect to the usage, but not so much with respect to the implementation details.

Here is a collection of notes I've compiled from my dive into the codebase. This may prove beneficial for anyone looking to understand or extend HuggingFace's generation pipeline.

Continue reading →

More posts can be found in the archive.