Is sampling and model determine the functional forms? For example are exponents for data and model scaling ever the same?

functional form represents how well the data from the sample will fit the model. The approximation
different math equations offer different functional forms: linear, quadratic.
We select a functional form based on the assumptions of the data.

Scaling regimes the same?

A taxonomy exists that classifies different scaling motivations (compute, parameters, etc) that have different mechanistic origins that classify differently?

Is there universal, generalizable behvior, or is the whole problem too dependent on microscopic details?

Approach and test simple theory

We want to learn a model with parameters 0 with data distribution p(x,y)

Loss is function of amount of data and number of model params

Architectures