https://pair.withgoogle.com/explorables/grokking/

Frequency measure

Screen Shot 2023-08-11 at 2.29.09 PM.png

y axis is activation value: The activation value of a neuron is the result of applying an activation function (e.g., ReLU, sigmoid) to the weighted sum of its inputs plus a bias term. It's the "output" of the neuron that gets passed to the next layer.

x axis: the input value (modulo 67, so 0→ 66)

We are measuring activation pattern

Each graph is at a certain frequency

TAKEAWAY:

Generalizing 0s and 1s

Start with knowing the generalized solution, and try to understand why the model eventually learns it.

The goal achieved through generalizability is due to two factors: