DALL-E 2/AI Tech

DALL-E 2 employs diffusion, one of the most popular and efficient types of generative models.

October 5, 2022

Cornell Professors Explain AI-generated Art as Generative Technology Expands

Print More

The most popular systems for artificial intelligence-generated art include Midjourney, Stable Diffusion and most notably, DALL-E 2: OpenAI’s program named after Salvador Dali and Disney’s WALL-E. The system opened to the general public on Sept. 28.

Generative artificial intelligence involves entering brief keywords that correlate to a desired image, including the style of output. For example, “Cornell University, digital art” indicates that a user is searching for digital art of the University. Results appear in mere seconds, paired with opportunities to edit and fine-tune the original written inputs.

Whether it be celebrity Pokémon or Van Gogh-style bears atop McGraw Tower, text-to-image generative AI technology allows users to create digital renderings of short, descriptive text prompts. Generative AI is a rapidly advancing form of AI, distinct in its ability to create realistic new content like images, text or code. 

This technology employs a subset of machine learning called deep learning, a biologically-inspired method of learning from large sets of raw data. AI art uses generative models that take in training data in the form of images and work to produce information similar to the initial dataset. This results in a new image resembling the original selection of media.

DALL-E 2 employs diffusion, one of the most popular and efficient types of generative models. It destroys and recovers existing training data as a means of synthesizing new images.

“You take real images and add a little noise to them, then learn to remove that noise to get back the original image,” Prof. Noah Snavely, computer science, said. 

Adding noise means deliberately altering pixels to random colors. With the system now having learned how to “de-noise,” it can generate new images by passing random noise patterns through that reversal process. 

“If you also have text, like captions, accompanying the training images, these methods can also be adapted to generate images that match a given input text string,” Prof. Snavely said. 

Text encoders interpret user-inputted words. Methods used for linking the prompts to captioned-visuals vary from company to company.

Unlike past models in the AI art sphere, diffusion models draw “from the idea of diffusion in physics to do this noise removal one tiny step at a time [as opposed to in one go],” Prof. Bharath Hariharan, computer science, said.

Removing noise one at a time as opposed to in one sum results in cutting-edge image quality as it tackles finer details. 

Training data contain millions, and sometimes billions, of uncurated images from the web. As a result, some media-making AI programs have displayed explicit results reflecting the dangerous and prejudiced sides of the internet. 

Pornographic images, depictions of violence and racist or homophobic content have surfaced. 

This has prompted conversations on ethics, algorithmic bias and digital safety as tech companies work to develop stronger filtration systems. 

AI-generated images remain controversial as artists debate its legal, social and creative merit. Getty Images and Shutterstock have completely banned AI-created works, just a few weeks after a state fair-winning AI art piece fueled outrage across social media.

However, AI art has continued to strive in other mediums. Cosmopolitan featured the world’s first AI-generated magazine cover in June 2022. In September, Kris Kashtanova became the first known artist to receive U.S. copyright registration for an AI-generated art-based graphic novel. 

Some artists argue that because AI art systems are trained with existing images, generative systems do not produce purely distinct, original works.

It is difficult to say how the generated images differ from their originals. 

“In practice, because it is trained on billions of images, and because the model itself is relatively much smaller, it cannot really memorize the images,” Hariharan said. 

Even so, the results can not stray too far from its training data, and Hariharan said it is difficult to succinctly articulate its limits.

Despite these limits, the intensifying market has sparked new rivals in what some media have deemed the “AI space race”, such as Google’s text-to-image system Imagen. Meta unveiled Make-A-Video last month, which converts text to videos. 
The public can explore AI-generated works for free at DALL-E 2 and NightCafe, both now open on limited credit-based systems.