FID Assessment of Generative Text-Image Foundation Models

\ Throughout the last years it has been common practice for generative text-to-image models to assess FID- [12] and CLIP-scores [34, 36] in a zero-shot setting on complex, small-scale text-image datasets of natural images such as COCO [26]. However, with the advent of foundational text-to-image models [40, 37, 38, 1], which are not only targeting visual compositionality, but also at other difficult tasks such as deep text understanding, fine-grained distinction between unique artistic styles and especially a pronounced sense of visual aesthetics, this particular form of model evaluation has become more and more questionable. Kirstain et al. [23] demonstrates that COCO zero-shot FID is negatively correlated with visual aesthetics, and such measuring the generative performance of such models should be rather done by human evaluators. We investigate this for SDXL and visualize FID-vs-CLIP curves in Fig. 12 for 10k text-image pairs from COCO [26]. Despite its drastically improved performance as measured quantitatively by asking human assessors (see Fig. 1) as well as qualitatively (see Fig. 4 and Fig. 14), SDXL does not achieve better FID scores than the previous SD versions. Contrarily, FID for SDXL is the worst of all three compared models while only showing slightly improved CLIP-scores (measured with OpenClip ViT g-14). Thus, our results back the findings of Kirstain et al. [23] and further emphasize the need for additional quantitative performance scores, specifically for text-to-image foundation models. All scores have been evaluated based on 10k generated examples.

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

This content originally appeared on HackerNoon and was authored by Synthesizing

Print Share Comment Cite Upload Translate Updates

APA

Synthesizing | Sciencx (2024-10-04T08:00:23+00:00) FID Assessment of Generative Text-Image Foundation Models. Retrieved from https://www.scien.cx/2024/10/04/fid-assessment-of-generative-text-image-foundation-models/

MLA

" » FID Assessment of Generative Text-Image Foundation Models." Synthesizing | Sciencx - Friday October 4, 2024, https://www.scien.cx/2024/10/04/fid-assessment-of-generative-text-image-foundation-models/

HARVARD

Synthesizing | Sciencx Friday October 4, 2024 » FID Assessment of Generative Text-Image Foundation Models., viewed ,<https://www.scien.cx/2024/10/04/fid-assessment-of-generative-text-image-foundation-models/>

VANCOUVER

Synthesizing | Sciencx - » FID Assessment of Generative Text-Image Foundation Models. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/04/fid-assessment-of-generative-text-image-foundation-models/

CHICAGO

" » FID Assessment of Generative Text-Image Foundation Models." Synthesizing | Sciencx - Accessed . https://www.scien.cx/2024/10/04/fid-assessment-of-generative-text-image-foundation-models/

IEEE

" » FID Assessment of Generative Text-Image Foundation Models." Synthesizing | Sciencx [Online]. Available: https://www.scien.cx/2024/10/04/fid-assessment-of-generative-text-image-foundation-models/. [Accessed: ]

rf:citation

» FID Assessment of Generative Text-Image Foundation Models | Synthesizing | Sciencx | https://www.scien.cx/2024/10/04/fid-assessment-of-generative-text-image-foundation-models/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

F On FID Assessment of Generative Text-Image Foundation Models

Related Posts