your123 - stock.adobe.com

A closer look at what makes the AI tool Dall-E powerful

The language processing tool differs from most chatbots because it has access to specialized data sets. This makes it powerful, but also potentially dangerous.

For the past year, OpenAI's Dall-E, and the AI system's most recent version, Dall-E 2, has enticed consumers and enterprises alike.

The tool creates images from natural language description. The popularity of the AI system led OpenAI to expand the waiting list for it to one million people during its Dall-E 2 beta, introduced July 20.

Dall-E's seemingly magical ability to create any image users command it to will shift expectations for what natural language systems can do, according to Forrester analyst Will McKeon-White.

In this Q&A, McKeon-White discusses how he thinks Dall-E will change the conversational AI field as well as some of the safety measures OpenAI has put in place to keep people from misusing the system.

Dall-E is a tool that will make graphics and design better, not replace humans in those fields, McKeon-White says. However, the technology can also be dangerous in the wrong hands.

How does Dall-E differ from other natural language tools that are out there?

Will McKeon-White: Dall-E is remarkable in its ability to properly interpret the utterance that the user has typed in. The difference between one of these systems and a chatbot is what this is doing is basically pattern matching and trying to match tags that it has from a significant library of images.

[Other] basic bots can parse maybe 10 different intents that they were built for -- whether it's reserving a table, checking a menu or checking dietary options at a restaurant.

The difference between one of these systems and a chatbot is what this is doing is basically pattern matching and trying to match tags that it has from a significant library of images.
Will McKeon-WhiteAnalyst, Forrester

Being an enterprise or business chatbot, it won't necessarily be able to construct quite a degree of understanding of a complicated topic, like [the painting style] Impressionism. Dall-E can understand this because the library of images it's using has been tagged effectively, in order to train these models to understand what these specific words mean.

It is a very specialized, but extraordinarily capable model with a powerful data set. Most businesses do not have access to that [data set]. The expectation for business [chatbots] is going to be significantly higher now.

How will Dall-E work with deepfake technology?

McKeon-White: What is starting to be possible is -- somebody else coined this and I'm kind of mad -- 'natural language Photoshop.' Being able to theoretically adjust an existing photo using nothing but your words would allow even the most tech illiterate person to take this piece of software and create whatever image they want to see.

As it gets more 'photo real' … people will use this to make lewd images and to make really unflattering images of people they don't like because humans are awful. We've seen this explicitly already.

But the Dall-E terms and services have done a remarkable job of thinking about all the potential points that can go wrong and avoiding those before people try them.

When there's a new system that's AI, and a cool piece of technology, people will kick the tires and see if they can misappropriate it for their own amusement, which has already happened.

If you have people who have an agenda, or the like, and one of these solutions that doesn't have protections that the Dall-E team has put on this, it becomes significantly easier to fabricate a photo of a politician in a compromising position, for example.

How effective are the protections that the Dall-E team has put on the technology?

Headshot of Will McKeon-WhiteWill McKeon-White

McKeon-White: They're robust to the point where they have some word filters on it. They've done a fairly good job there of basically doing a lot of spot reviews of these solutions, having human reviewers being brought into the loop and then doing things like review of potential false positives to ensure that people really aren't misusing the tool.

I think things like the blanket ban on public figures is smart. But I don't necessarily know if other systems, like Midjourney, have totally caught up yet. I know that Google has their own edition, [Imagen], and the way that they do it is they have you submit the request to a Google Sheets document and then that is turned into a photo.

What we do see is -- and this has been a thought-about problem and there's a lot of work going on here in order to avoid some of the problems -- it's not going to catch all of them. But if you're a bad actor, you will get caught and you will be kicked out of the platform.

What we are most worried about is after these platforms are released into the wild, and somebody gets source code and replicates it, what happens if a nation or state releases unrestricted copy and then allows everybody to go hog wild on it intentionally?

Editor's note: This interview was edited for clarity and conciseness.

Dig Deeper on AI technologies