What’s Next for SDXL Development?

This report presents a preliminary analysis of improvements to the foundation model Stable Diffusion for text-to-image synthesis. While we achieve significant improvements in synthesized image quality, prompt adherence and composition, in the following, we discuss a few aspects for which we believe the model may be improved further:

\ • Single stage: Currently, we generate the best samples from SDXL using a two-stage approach with an additional refinement model. This results in having to load two large models into memory, hampering accessibility and sampling speed. Future work should investigate ways to provide a single stage of equal or better quality.

\ • Text synthesis: While the scale and the larger text encoder (OpenCLIP ViT-bigG [19]) help to improve the text rendering capabilities over previous versions of Stable Diffusion, incorporating byte-level tokenizers [52, 27] or simply scaling the model to larger sizes [53, 40] may further improve text synthesis.

\ • Architecture: During the exploration stage of this work, we briefly experimented with transformer-based architectures such as UViT [16] and DiT [33], but found no immediate benefit. We remain, however, optimistic that a careful hyperparameter study will eventually enable scaling to much larger transformer-dominated architectures.

\ • Distillation: While our improvements over the original Stable Diffusion model are significant, they come at the price of increased inference cost (both in VRAM and sampling speed). Future work will thus focus on decreasing the compute needed for inference, and increased sampling speed, for example through guidance- [29], knowledge- [6, 22, 24] and progressive distillation [41, 2, 29].

\ • Our model is trained in the discrete-time formulation of [14], and requires offset-noise [11, 25] for aesthetically pleasing results. The EDM-framework of Karras et al. [21] is a promising candidate for future model training, as its formulation in continuous time allows for increased sampling flexibility and does not require noise-schedule corrections.

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

This content originally appeared on HackerNoon and was authored by Synthesizing

Print Share Comment Cite Upload Translate Updates

APA

Synthesizing | Sciencx (2024-10-03T19:04:11+00:00) What’s Next for SDXL Development?. Retrieved from https://www.scien.cx/2024/10/03/whats-next-for-sdxl-development/

MLA

" » What’s Next for SDXL Development?." Synthesizing | Sciencx - Thursday October 3, 2024, https://www.scien.cx/2024/10/03/whats-next-for-sdxl-development/

HARVARD

Synthesizing | Sciencx Thursday October 3, 2024 » What’s Next for SDXL Development?., viewed ,<https://www.scien.cx/2024/10/03/whats-next-for-sdxl-development/>

VANCOUVER

Synthesizing | Sciencx - » What’s Next for SDXL Development?. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/03/whats-next-for-sdxl-development/

CHICAGO

" » What’s Next for SDXL Development?." Synthesizing | Sciencx - Accessed . https://www.scien.cx/2024/10/03/whats-next-for-sdxl-development/

IEEE

" » What’s Next for SDXL Development?." Synthesizing | Sciencx [Online]. Available: https://www.scien.cx/2024/10/03/whats-next-for-sdxl-development/. [Accessed: ]

rf:citation

» What’s Next for SDXL Development? | Synthesizing | Sciencx | https://www.scien.cx/2024/10/03/whats-next-for-sdxl-development/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

3 Future Work

Related Posts