https://pair.withgoogle.com/explorables/grokking/
y axis is activation value: The activation value of a neuron is the result of applying an activation function (e.g., ReLU, sigmoid) to the weighted sum of its inputs plus a bias term. It's the "output" of the neuron that gets passed to the next layer.
x axis: the input value (modulo 67, so 0→ 66)
We are measuring activation pattern
Each graph is at a certain frequency
If the activation pattern is cyclical and repeats "n" times, this frequency could manifest as "n" peaks in the graph, for example.
EX: Frequency 4 means for a neuron means there are 4 peaks
TAKEAWAY:
Start with knowing the generalized solution, and try to understand why the model eventually learns it.
The goal achieved through generalizability is due to two factors: