This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry cue and grayscale object shading as illumination cues.
Previously
- used conditioned diffusion model on 3-5 example images capture texture/material in latent space.
- ZeST only requires a single material exemplar image and a single input image. Zero shot, training-free
Problem:
- given inptu image, and an examplar image we want to transfer material to the input image
- material, geometry, and illumination from both the exemplar and the input image.
- Preserving other object and scene properties (background, lighting, etc)
Method:
- Exemplar image: find latent representation of exemplar image containing material texture
- Q: how do we know material latent representation 𝑧𝑀. will contain the material information?
- <u>problem of geometry and material entanglement within material embedding 𝑧𝑀 remains unsolved,</u>
- input image: pass into geometry guidance and illumination guidance.
Geometry Guidance:
- computes depth map 𝐷𝐼, input into controlnet,
- A depth map 𝐷𝐼 provides detailed information about the relative distances of objects from the viewpoint of the camera.
- ControlNet will preserve the geometry information of the given input.
- <u> we observe that the results suffer from inconsistency in preserving the illumination and background from the input image.</u>
- Uses IP-adapter used to encode the exemplar image and extract the material features for exemplar-guided generation. The IP adapter uses a CLIP image encoder to extract image features that can be injected into a diffusion model via the cross-attention layers. These features can be used as an additional condition to guide text prompts or other mediums for the generation.
The Latent Illumination Guidance
Illumination: Illumination refers to the lighting conditions affecting an object or scene in an image