empirical distribution p_hat

statisticians say its fine, diffusion is bad, copying the datapoints

there is some diffusion theory which defeats classic sampling. memorizing datapoints.

diffusion care about novel and new things.

Suppose using VP process. $x_t = u_tx_t + \sigma_tW_t$

classical theory: if sum over T and expectation over x

synthetic data

real diffusion models generates real data

theory doesnt know and says its memorize

Non memorization is between optimization and implicit bias of the network.