FaceStudio: Put Your Face Everywhere in Seconds: Implementation Details.

The vision encoder utilized in the image-conditioned branch of our model combines three CLIP model [40] variants with different backbones. These are: CLIP-ViT-L/14, CLIP-RN101, and CLIP-ViT-B/32. The outputs from these individual models are concatenated to produce the final output of our vision encoder. Our approach primarily utilizes the DDPM configuration [20] as described in StableDiffusion [42] for training. Specifically, we incorporated a total of 1,000 denoising steps. For the inference stage, we use the EulerA sampler [2] and set it to operate over 25 timesteps. To align with the training methodology of classifier-free guidance [19], we introduced variability by randomly omitting the conditional embeddings related to both style images and face images. Specifically, the probabilities for dropping these embeddings were set at 0.64 for style images and 0.1 for face images.

Figure 6. Hybrid-guidance experiments. In this experiment, we employ an approach that combines textual prompts and reference images for image synthesis, and the text prompt used here pertains to the cartoon style.

Figure 7. Identity mixing experiment. We generate facial images that combine multiple identities using a mixing ratio to control the influence of different IDs.

The primary dataset used for training was FFHQ [25], which is a face image dataset encompassing 70,000 images. To augment this, we also incorporated a subset of the LAION dataset [46] into our training phase, which aims to ensure the model retains the capability to generate generic, non-human images during the finetuning process. It’s worth noting that when non-human images are sampled for training, the face embedding in the conditional branch is set to zero. During training, we set the learning rate at 1e-6. The model was trained using 8 A100 GPUs, with a batch size of 256, and was trained for 100,000 steps.

:::info This paper is available on arxiv under CC0 1.0 DEED license.

:::

This content originally appeared on HackerNoon and was authored by Dilution

Print Share Comment Cite Upload Translate Updates

APA

Dilution | Sciencx (2024-08-14T15:00:23+00:00) FaceStudio: Put Your Face Everywhere in Seconds: Implementation Details.. Retrieved from https://www.scien.cx/2024/08/14/facestudio-put-your-face-everywhere-in-seconds-implementation-details/

MLA

" » FaceStudio: Put Your Face Everywhere in Seconds: Implementation Details.." Dilution | Sciencx - Wednesday August 14, 2024, https://www.scien.cx/2024/08/14/facestudio-put-your-face-everywhere-in-seconds-implementation-details/

HARVARD

Dilution | Sciencx Wednesday August 14, 2024 » FaceStudio: Put Your Face Everywhere in Seconds: Implementation Details.., viewed ,<https://www.scien.cx/2024/08/14/facestudio-put-your-face-everywhere-in-seconds-implementation-details/>

VANCOUVER

Dilution | Sciencx - » FaceStudio: Put Your Face Everywhere in Seconds: Implementation Details.. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/08/14/facestudio-put-your-face-everywhere-in-seconds-implementation-details/

CHICAGO

" » FaceStudio: Put Your Face Everywhere in Seconds: Implementation Details.." Dilution | Sciencx - Accessed . https://www.scien.cx/2024/08/14/facestudio-put-your-face-everywhere-in-seconds-implementation-details/

IEEE

" » FaceStudio: Put Your Face Everywhere in Seconds: Implementation Details.." Dilution | Sciencx [Online]. Available: https://www.scien.cx/2024/08/14/facestudio-put-your-face-everywhere-in-seconds-implementation-details/. [Accessed: ]

rf:citation

» FaceStudio: Put Your Face Everywhere in Seconds: Implementation Details. | Dilution | Sciencx | https://www.scien.cx/2024/08/14/facestudio-put-your-face-everywhere-in-seconds-implementation-details/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

4. Experiments

4.1. Implementation details.

Related Posts