Multi-Aspect Training: Adapting SDXL for Real-World Image Diversity

This section discusses multi-aspect training in SDXL, emphasizing the importance of accommodating diverse image sizes and aspect ratios. It highlights the implementation of conditioning techniques and the complementary nature of crop-conditioning.


This content originally appeared on HackerNoon and was authored by Synthesizing

:::info Authors:

(1) Dustin Podell, Stability AI, Applied Research;

(2) Zion English, Stability AI, Applied Research;

(3) Kyle Lacey, Stability AI, Applied Research;

(4) Andreas Blattmann, Stability AI, Applied Research;

(5) Tim Dockhorn, Stability AI, Applied Research;

(6) Jonas Müller, Stability AI, Applied Research;

(7) Joe Penna, Stability AI, Applied Research;

(8) Robin Rombach, Stability AI, Applied Research.

:::

Abstract and 1 Introduction

2 Improving Stable Diffusion

2.1 Architecture & Scale

2.2 Micro-Conditioning

2.3 Multi-Aspect Training

2.4 Improved Autoencoder and 2.5 Putting Everything Together

3 Future Work

\ Appendix

A Acknowledgements

B Limitations

C Diffusion Models

D Comparison to the State of the Art

E Comparison to Midjourney v5.1

F On FID Assessment of Generative Text-Image Foundation Models

G Additional Comparison between Single- and Two-Stage SDXL pipeline

References

2.3 Multi-Aspect Training

Real-world datasets include images of widely varying sizes and aspect-ratios (c.f. fig. 2) While the common output resolutions for text-to-image models are square images of 512 × 512 or 1024 × 1024 pixels, we argue that this is a rather unnatural choice, given the widespread distribution and use of landscape (e.g., 16:9) or portrait format screens.

\

\

\

\ In practice, we apply multi-aspect training as a finetuning stage after pretraining the model at a fixed aspect-ratio and resolution and combine it with the conditioning techniques introduced in Sec. 2.2 via concatenation along the channel axis. Fig. 16 in App. J provides python-code for this operation. Note that crop-conditioning and multi-aspect training are complementary operations, and crop-conditioning then only works within the bucket boundaries (usually 64 pixels). For ease of implementation, however, we opt to keep this control parameter for multi-aspect models.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Synthesizing


Print Share Comment Cite Upload Translate Updates
APA

Synthesizing | Sciencx (2024-10-03T19:03:57+00:00) Multi-Aspect Training: Adapting SDXL for Real-World Image Diversity. Retrieved from https://www.scien.cx/2024/10/03/multi-aspect-training-adapting-sdxl-for-real-world-image-diversity/

MLA
" » Multi-Aspect Training: Adapting SDXL for Real-World Image Diversity." Synthesizing | Sciencx - Thursday October 3, 2024, https://www.scien.cx/2024/10/03/multi-aspect-training-adapting-sdxl-for-real-world-image-diversity/
HARVARD
Synthesizing | Sciencx Thursday October 3, 2024 » Multi-Aspect Training: Adapting SDXL for Real-World Image Diversity., viewed ,<https://www.scien.cx/2024/10/03/multi-aspect-training-adapting-sdxl-for-real-world-image-diversity/>
VANCOUVER
Synthesizing | Sciencx - » Multi-Aspect Training: Adapting SDXL for Real-World Image Diversity. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/03/multi-aspect-training-adapting-sdxl-for-real-world-image-diversity/
CHICAGO
" » Multi-Aspect Training: Adapting SDXL for Real-World Image Diversity." Synthesizing | Sciencx - Accessed . https://www.scien.cx/2024/10/03/multi-aspect-training-adapting-sdxl-for-real-world-image-diversity/
IEEE
" » Multi-Aspect Training: Adapting SDXL for Real-World Image Diversity." Synthesizing | Sciencx [Online]. Available: https://www.scien.cx/2024/10/03/multi-aspect-training-adapting-sdxl-for-real-world-image-diversity/. [Accessed: ]
rf:citation
» Multi-Aspect Training: Adapting SDXL for Real-World Image Diversity | Synthesizing | Sciencx | https://www.scien.cx/2024/10/03/multi-aspect-training-adapting-sdxl-for-real-world-image-diversity/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.