ZeST: Material Transfer

This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry cue and grayscale object shading as illumination cues.

Previously

used conditioned diffusion model on 3-5 example images capture texture/material in latent space.
ZeST only requires a single material exemplar image and a single input image. Zero shot, training-free

Problem:

given inptu image, and an examplar image we want to transfer material to the input image
material, geometry, and illumination from both the exemplar and the input image.
Preserving other object and scene properties (background, lighting, etc)

Method:

Exemplar image: find latent representation of exemplar image containing material texture
- Q: how do we know material latent representation 𝑧𝑀. will contain the material information?
- <u>problem of geometry and material entanglement within material embedding 𝑧𝑀 remains unsolved,</u>
input image: pass into geometry guidance and illumination guidance.

Geometry Guidance:

computes depth map 𝐷𝐼, input into controlnet,
A depth map 𝐷𝐼 provides detailed information about the relative distances of objects from the viewpoint of the camera.
ControlNet will preserve the geometry information of the given input.
<u> we observe that the results suffer from inconsistency in preserving the illumination and background from the input image.</u>
Uses IP-adapter used to encode the exemplar image and extract the material features for exemplar-guided generation. The IP adapter uses a CLIP image encoder to extract image features that can be injected into a diffusion model via the cross-attention layers. These features can be used as an additional condition to guide text prompts or other mediums for the generation.

The Latent Illumination Guidance

Illumination: Illumination refers to the lighting conditions affecting an object or scene in an image