Stability AI is out today with a major update for its text to image generative AI technology with the debut of Stable Diffusion 3.5. A key goal for the new update is raise the bar and improve upon Stability AI’s last major update, which the company admitted didn’t live up to its own standards. Stable Diffusion 3 was first previewed back in February and the first open model version became generally available in June with the debut of Stable Diffusion 3 Medium. While Stability AI was an early pioneer in the text to image generative AI space, it has increasingly faced stiff competition from numerous rivals including Black Forest Labs’ Flux Pro, OpenAI’s Dall-E, Ideogram and Midjourney.
With Stable Diffusion 3.5, Stability AI is looking to reclaim its leadership position. The new models are highly customizable and can generate a wide range of different styles. The new update introduces multiple model variants, each designed to cater to different user needs.Stable Diffusion 3.5 Large is an 8 billion parameter model that offers the highest quality and prompt adherence in the series. Stable Diffusion 3.5 Large Turbo is a distilled version of the large model, providing faster image generation. Rounding out the new models is Stable Diffusion 3.5 Medium, which has 2.6 billion parameters and is optimized for edge computing deployments.
All three of the new Stable Diffusion 3.5 models are available under the Stability AI Community License, which is an open license that enables free non-commercial usage and free commercial usage for entities with annual revenue under $1 million. Stability AI has an enterprise license for larger deployments. The models are available via Stability AI’s API as well as Hugging Face.
The original release of Stable Diffusion 3 Medium in June, was a less than ideal release. The lessons learned from that experience have helped to inform and improve the new Stable Diffusion 3.5 updates.
“We identified that several model and dataset choices that we made for the Stable Diffusion Large 8B model were not optimal for the smaller-sized Medium model,” Hanno Basse, CTO of Stability AI told VentureBeat. “We did thorough analysis of these bottlenecks and innovated further on our architecture and training protocols on the Medium model to provide a better balance between the model size and the output quality.”
How Stability AI is improving text to image generative AI with Stable Diffusion 3.5
As part of building out Stable Diffusion 3.5, Stability AI took advantage of a number of novel techniques to improve quality and performance.
A notable addition to Stable Diffusion 3.5 is the integration of Query-Key Normalization into the transformer blocks. This technique facilitates easier fine-tuning and further development of the models by end-users. Query-Key Normalization makes the model more stable for training and fine-tuning.
“While we have experimented with QK-normalization in the past, this is our first model release with this normalization,” Basse explained. “It made sense to use it for this new model as we prioritized customization.”
Stability AI has also enhanced its Multimodal Diffusion Transformer MMDiT-X architecture, specifically for the medium model. Stability AI first highlighted the MMDiT architecture approach in April, when the Stable Diffusion 3 API became available. MMDiT is noteworthy as it blends diffusion model techniques with transformer model techniques. With the updates as part of Stable Diffusion 3.5, MMDiT-X is now able to help improve image quality as well enhancing multi-resolution generation capabilities
Prompt adherence makes Stable Diffusion 3.5 even more powerful
Stability AI reports that Stable Diffusion 3.5 Large demonstrates superior prompt adherence compared to other models in the market.
The promise of better prompt adherence is all about the models ability to accurately interpret and render user prompts.
“This is achieved with a combination of different things – better dataset curation, captioning and additional innovation in training protocols,” Basse said.
Customization will get even better with ControlNets
Looking forward, Stability AI is planning on releasing a ControlNets capability for Stable Diffusion 3.5.
The promise of ControlNets is more control for various professional use cases. StabilityAI first introduced ControlNet technology as part of its SDXL 1.0 release in July 2023.
“ControlNets give spatial control over different professional applications where users, for example, may want to upscale an image while maintaining the overall colors or create an image that follows a specific depth pattern,” Basse said.
Author: Sean Michael Kerner
Source: Venturebeat
Reviewed By: Editorial Team