Problem: different viewpoints generate inconsistencies. an eye may be different on the other side.
- improvements: https://arxiv.org/pdf/2403.01807
- cross frame attention (each frame goes into another frame when injecting to generate new frame (used in video diffusion for temporal consistency from frame to frame)
different depth generates inconsistencies.
https://arxiv.org/html/2403.05102v1