Modular Enhancements for Stable Diffusion Architecture

This section outlines modular improvements to the Stable Diffusion architecture, applicable individually or collectively to enhance model performance. The strategies presented extend the capabilities of latent diffusion models and can also be adapted for pixel-space models.


This content originally appeared on HackerNoon and was authored by Synthesizing

:::info Authors:

(1) Dustin Podell, Stability AI, Applied Research;

(2) Zion English, Stability AI, Applied Research;

(3) Kyle Lacey, Stability AI, Applied Research;

(4) Andreas Blattmann, Stability AI, Applied Research;

(5) Tim Dockhorn, Stability AI, Applied Research;

(6) Jonas Müller, Stability AI, Applied Research;

(7) Joe Penna, Stability AI, Applied Research;

(8) Robin Rombach, Stability AI, Applied Research.

:::

Abstract and 1 Introduction

2 Improving Stable Diffusion

2.1 Architecture & Scale

2.2 Micro-Conditioning

2.3 Multi-Aspect Training

2.4 Improved Autoencoder and 2.5 Putting Everything Together

3 Future Work

\ Appendix

A Acknowledgements

B Limitations

C Diffusion Models

D Comparison to the State of the Art

E Comparison to Midjourney v5.1

F On FID Assessment of Generative Text-Image Foundation Models

G Additional Comparison between Single- and Two-Stage SDXL pipeline

References

2 Improving Stable Diffusion

In this section we present our improvements for the Stable Diffusion architecture. These are modular, and can be used individually or together to extend any model. Although the following strategies are implemented as extensions to latent diffusion models (LDMs) [38], most of them are also applicable to their pixel-space counterparts.

\ Figure 1: Left: Comparing user preferences between SDXL and Stable Diffusion 1.5 & 2.1. While SDXL already clearly outperforms Stable Diffusion 1.5 & 2.1, adding the additional refinement stage boosts performance. Right: Visualization of the two-stage pipeline: We generate initial latents of size 128 × 128 using SDXL. Afterwards, we utilize a specialized high-resolution refinement model and apply SDEdit [28] on the latents generated in the first step, using the same prompt. SDXL and the refinement model use the same autoencoder.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Synthesizing


Print Share Comment Cite Upload Translate Updates
APA

Synthesizing | Sciencx (2024-10-03T19:03:34+00:00) Modular Enhancements for Stable Diffusion Architecture. Retrieved from https://www.scien.cx/2024/10/03/modular-enhancements-for-stable-diffusion-architecture/

MLA
" » Modular Enhancements for Stable Diffusion Architecture." Synthesizing | Sciencx - Thursday October 3, 2024, https://www.scien.cx/2024/10/03/modular-enhancements-for-stable-diffusion-architecture/
HARVARD
Synthesizing | Sciencx Thursday October 3, 2024 » Modular Enhancements for Stable Diffusion Architecture., viewed ,<https://www.scien.cx/2024/10/03/modular-enhancements-for-stable-diffusion-architecture/>
VANCOUVER
Synthesizing | Sciencx - » Modular Enhancements for Stable Diffusion Architecture. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/03/modular-enhancements-for-stable-diffusion-architecture/
CHICAGO
" » Modular Enhancements for Stable Diffusion Architecture." Synthesizing | Sciencx - Accessed . https://www.scien.cx/2024/10/03/modular-enhancements-for-stable-diffusion-architecture/
IEEE
" » Modular Enhancements for Stable Diffusion Architecture." Synthesizing | Sciencx [Online]. Available: https://www.scien.cx/2024/10/03/modular-enhancements-for-stable-diffusion-architecture/. [Accessed: ]
rf:citation
» Modular Enhancements for Stable Diffusion Architecture | Synthesizing | Sciencx | https://www.scien.cx/2024/10/03/modular-enhancements-for-stable-diffusion-architecture/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.