How DALL-E Uses AI To Generate Images From Text Descriptions

Screen Rant

By Kyle Encina

Published Jan 10, 2021

DALL-E utilizes an artificial intelligence algorithm to come up with vivid images based on text descriptions, with various potential applications.

DALL-E is an artificial intelligence (AI) system that's trained to form exceptionally detailed images from descriptive texts. It's already showing promising results, but its behavioral lapses suggest that utilizing its algorithm for more practical applications may take some time. The text-to-image software is the brainchild of non-profit AI research group OpenAI.

The company was founded by numerous tech visionaries, including Tesla and SpaceX CEO Elon Musk, and is responsible for developing various deep-learning AI tools. One of these is the Generative Pre-Trained Transformer 3, an AI capable of generating news or essays to a quality that's almost difficult to discern from pieces written by actual people. GPT-3 also well in other applications, such as answering questions, writing fiction, and coding, as well as being utilized by other companies as an interactive AI chatbot.

Now, OpenAI is working on another GPT-3 variant called DALL-E, only this time with more emphasis on forming artificially-rendered pictures completely from scratch, out of lines of text. According to its blog post, the name was derived from combining Disney Pixar's WALL-E and famous painter Salvador Dali, referencing its intended ability to transform words into images with uncanny machine-like precision. The AI is capable of translating intricate sentences into pictures in “plausible ways.” DALL-E takes text and image as a single stream of data and converts them into images using a dataset that consists of text-image pairs. OpenAI claims that DALL-E is capable of understanding what a text is implying even when certain details aren't mentioned and that it is able to generate plausible images by “filling in the blanks” of the missing details.

DALL-E: Promising AI Applications, But Still With Limitations

AI algorithms tend to falter when it comes to generating images due to lapses in the datasets used in their training. However, DALL-E came up with sensible renditions of not just practical objects, but even abstract concepts as well. For example, in a text describing a capybara in a field at sunrise, the AI surprisingly displayed logical reasoning by rendering pictures of the subject casting its shadow without that particular detail being specifically mentioned in the text. It was even able to display good judgment in bringing abstract, imaginary concepts to life, such as creating a harp-textured snail by relating the arched portion of the harp to the curve of the snail's shell, and creatively combining both elements into a single concept.

DALL-E does tend to get overwhelmed with longer strings of text, though, becoming less accurate with the more description that is added. The AI also falls victim to cultural stereotypes, such as generalizing Chinese food as simply dumplings. Of course, once it's perfected, there are a wealth of applications for such a tool, from marketing and design concepts to visualizing storyboards from plot summaries. Perhaps AI algorithms like DALL-E might soon be even better than humans at drawing images the same way they bested us in aerial dogfights.

More: How Light Could Help AI Radically Improve Learning Speed & Efficiency

Source: OpenAI