Stability AI Debuts DeepFloyd IF to Generate Text in Images and Teases a New Chatbot
How to try out the Deep Floyd. It's by no means flawless, but is interesting.
Stability AI made two announcements Friday. The first was about Deep Floyd IF, a new text-to-image AI model trained to render text more accurately. The second was for Stable Vicuna, a new “large-scale open source chatbot trained via reinforced learning from human feedback (RHLF).”
Text-to-Image with Text
DeepFloyd IF has the benefit of a demonstration application available today on Hugging Face. It has a simple interface with a dialogue box for your prompt and another for a negative prompt (Nice touch!).
It can be pretty hard to see what you created, but it is currently free to upscale any of the four images to get a closer look. In the image at the top of the post, the text was supposed to read: Voicebot was Here! The result is not 100% accurate given there was supposed to be a lower case “b,” and the quote mark is extraneous. However, this is far better than we typically see from text-to-image models.
The example at the top is also much better than I was able to accomplish in other tries. You can see some examples below where the letters mostly look like letters but some are missing or in the wrong place.
Deep Floyd had particular trouble with Synthedia, which makes me think it may not just be looking at letters but also whole words the model has seen before. In any event, you can try it out here. Let me know what you think.
Also, it is worth noting that Deep Floyd does render text with far greater coherence than any of the previous Stable Diffusion models and other text-to-image models. It has flaws but is definitely an improvement that shows tangible progress.
A New Open Source Chatbot
Stability AI says that “StableVicuna is a further instruction fine tuned and RLHF trained version of Vicuna v0 13b, which is an instruction fine tuned LLaMA 13b model.” In addition to chatting, the company says it can do basic math, code, and help with grammar.
Stable Vicuna was created by a group of researchers from UC Berkeley, CMU, Stanford, MBZUAI, and UC San Diego. It is a 13B parameter chatbot that its creators say “achieves more than 90% quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90% of the cases.”
Stability AI provided a chart showing Stable Vicuna outperforming several other open source chatbots on a variety of tests. Unfortunately, this table does not include data from HuggingChat or ChatGPT. That would be a helpful comparison.
There is no demonstration version of this model yet. However, Stability AI says this is coming. The company shared a screenshot of the application that is currently in testing. It will probably look familiar.
You will notice that the name in the image suggests it is StableLM-Tuned-Alpha. If Stability maintains its current brand naming architecture, the name StableChat seems like a good bet for the product at launch.
Developers can also download Stable Vicuna 13B from Hugging Face. However, they will need access to the original LLaMA model weights before deploying.
ChatGPT will forever be known as the chatbot that launched a thousand million chatbots. I will let you know when I have a chance to try this out. In the meantime, you can review my evaluation of HuggingChat below.