https://arxiv.org/pdf/2310.02557.pdf

Inductive biases help reduce high complexity in models, creates prior for true distribution so it narrows down the distribution space and model can learn quicker

3.1 DENOISING AS SHRINKAGE IN AN ADAPTIVE BASIS

MSE(f, σ2) = Eh∥y − f(y)∥2 + 2σ2 tr∇f(y) − σ2di.

Invariances naturally enforced in MSE, in which more numbers of small eigenvalues will result in high cutting off threshold, thus removing lots of the eigenvalues from the next denoising step.

<aside> 💡 Think of invariants as hiding behind large eigenvalues (high signal) thinking they can remain here. The MSE natually enforces the extermination of these invariants in hiding, by penalizing if the eigenvalue sum is large and thus setting a higher threshold cutoff to remove useless eigenvalues.

</aside>

Jacobian Matrix

We want to create an estimator for what denoising looks like. We have as input a noisy image, and output a cleaner image. So for this to happen, our transformation needs to decompose the individual changes of some variable and the subsequent output changes from that small change.

A partial differential matrix will do, where we can map each input dimension with each output dimension.

Carries information for all partial differential information. (small change in x or y ⇒ what change in evaluation f(x) or f(y))

Gives you a grid of all partial derivatives

Local linearity - represent what transformation when you zoom into a specific point. Tweak an input dimension (small change), then see output. Then tweak another input dimension and see output in that dimension)

Rows - number of transformatnion dimensions, resulting in distinct transformation functions in a single vector.

Cols - number of input dimensions, which is equal to number of transformation because input matches output

Screen Shot 2023-12-06 at 4.41.25 PM.png

The denoiser can thus be interpreted as performing shrinkage with factors λk(y) along axes of a basis specified by ek(y).

A higher rank means more dimensions in jacobian space, can be approx by sum of eigenvalues (higher means more influence input to output)