https://www.youtube.com/watch?v=MUvFuZpxLU8&list=PLgKuh-lKre12qVTl88k2n2N37tT-BpmHT&index=7

Is sampling and model determine the functional forms? For example are exponents for data and model scaling ever the same?

Scaling regimes the same?

Is there universal, generalizable behvior, or is the whole problem too dependent on microscopic details?

Approach and test simple theory

We want to learn a model with parameters 0 with data distribution p(x,y)

Loss is function of amount of data and number of model params

Architectures