Stability AI Brings Out Image-to-3D and a New Membership Model
A return to the company's image roots is in order
Stability AI is continuing the journey back to its roots in visual imagery. After several adventures with large language models (LLM) throughout the spring and summer of 2023, Stability introduced the text-to-video AI model Stable Video Diffusion in November. The company debuted its new 3D image generative AI model, Stable Zero123, this past week.
Stable Zero123 is a model that transforms still images into 3D renderings and is based on Zero123-XL, which was developed by researchers at Columbia University and Toyota Research Institute. The research paper introducing Zero123-XL commented:
From just a single camera view, humans are often able to imagine an object’s 3D shape and appearance. This ability is important for everyday tasks, such as object manipulation and navigation in complex environments, but is also key for visual creativity, such as painting. While this ability can be partially explained by reliance on geometric priors like symmetry, we seem to be able to generalize to much more challenging objects that break physical and geometric constraints with ease. In fact, we can predict the 3D shape of objects that do not (or even cannot) exist in the physical world (see third column in Figure 1). To achieve this degree of generalization, humans rely on prior knowledge accumulated through a lifetime of visual exploration.
In contrast, most existing approaches for 3D image reconstruction operate in a closed-world setting due to their reliance on expensive 3D annotations (e.g. CAD models) or category-specific priors…The primary contribution of this paper is to demonstrate that large diffusion models have learned rich 3D priors about the visual world, even though they are only trained on 2D images.
Stability AI researchers added in the blog post introducing the model:
Today we’re releasing Stable Zero123, our in-house trained model for view-conditioned image generation. Stable Zero123 produces notably improved results compared to the previous state-of-the-art, Zero123-XL. This is achieved through 3 key innovations:
An improved training dataset heavily filtered from Objaverse, to only preserve high quality 3D objects, that we rendered much more realistically than previous methods
During training and inference, we provide the model with an estimated camera angle. This elevation conditioning allows it to make more informed, higher quality predictions.
A pre-computed dataset (pre-computed latents) and improved dataloader supporting higher batch size, that, combined with the 1st innovation, yielded a 40X speed-up in training efficiency compared to Zero123-XL.
The new model is available for non-commercial use research through Hugging Face. Stability AI has also released the model weights for users.
Stability Upgrades its Revenue Model
Stability AI has also introduced a new membership model designed to encourage professional users to pay for the service hosted by the company. Non-commercial use is free for users. The professional edition is $20 per month and has three eligibility requirements. The user and their organization must have:
Less than $1 million in annual revenue
Less than $1 million in institutional funding
Fewer than 1 million monthly active users (MAU)
Any company that does not meet these criteria will need to contact Stability AI directly for the terms of an enterprise license. The memberships represent Stability AI’s most substantial business model update.
The ability to generate revenue and show investors the results is surely designed to look more attractive during the company’s next funding round. Stability AI generally builds on top of a research organization’s generative AI work. Anyone can access the open-source software, but Stability AI wants companies to pay for value-added services if they are using the APIs for a business effort. Seems reasonable.
I suspect the vast majority of companies using Stable Diffusion do not pay any fees. Stability AI’s new membership platform will at least provide an easy way for organizations to voluntarily put money back into the ecosystem.