A few weeks ago, I found myself implementing "Stochastic Beams and Where to Find Them" (sampling without replacement from a Transformer).
Debugging and verifying the correctness of a sampling algorithm in HuggingFace is not straightforward. Thus, I built a fake carcass for a Transformer model with a small vocabulary and fixed controlled probabilities that could allow to keep a close eye on the logits and the generated sequence.
stateDiagram-v2
state "[0]" as 1
state "[0,1]" as 01
state "[0,2]" as 02
state "[0,1,1]" as 011
state "[0,1,1,3]" as 0113
state "[0,1,2]" as 012
state "[0,1,2,3]" as 0123
state "[0,2,1]" as 021
state "[0,2,1,3]" as 0213
state "[0,2,2]" as 022
state "[0,2,2,3]" as 0223
note right of 0113
prob=0.075
logp=-2.59
end note
note right of 0123
prob=0.675
logp=-0.39
end note
note right of 0223
prob=0.225
logp=-1.49
end note
note right of 0213
prob=0.025
logp=-3.68
end note
[*] --> 1 : 0 (BOS)
1 --> 01 : 75%
1 --> 02 : 25%
01 --> 011 : 10%
01 --> 012 : 90%
02 --> 021 : 10%
02 --> 022 : 90%
011 --> 0113 : EOS
012 --> 0123 : EOS
021 --> 0213 : EOS
022 --> 0223 : EOS