This post is converted from a Jupyter Notebook. To view the original interactive version, check out the Colab notebook.
Monthly Algorithmic Challenge (November 2024): Trigrams
Last week, I worked through the monthly Mechanistic Interpretability challenge from Callum McDougall’s ARENA course.
(A huge shoutout to Callum and the entire ARENA team for all the work they do!)
The challenge was to interpret how a simple neural net - in this case, a 1 layer 1 head transformer (with MLP) - solves a problem. The problem at hand was to predict the next token in a sequence of random tokens. As the model was trained with cross-entropy loss, training on a completely random dataset would lead the model to always uniformly predict all tokens in the vocabulary.