empirical distribution p_hat
statisticians say its fine, diffusion is bad, copying the datapoints
there is some diffusion theory which defeats classic sampling. memorizing datapoints.
diffusion care about novel and new things.
Suppose using VP process. $x_t = u_tx_t + \sigma_tW_t$
classical theory: if sum over T and expectation over x
synthetic data
real diffusion models generates real data
theory doesnt know and says its memorize
Non memorization is between optimization and implicit bias of the network.