Tech Accelerator What is GenAI? Generative AI explained

Prev Next

Definition

What is image-to-image translation?

Nick Barney

By

Nick Barney, Technology Writer

Published: Nov 21, 2024

Image-to-image translation is a generative artificial intelligence (AI) technique that translates a source image into a target image while preserving certain visual properties of the original image. This technology uses machine learning and deep learning techniques such as generative adversarial networks (GANs); conditional adversarial networks, or cGANs; and convolutional neural networks (CNNs) to learn complex mapping functions between input and output images.

Image-to-image translation allows images to be converted from one form to another while retaining essential features. The goal is to learn a mapping between the two domains and then generate realistic images in whatever style a designer chooses. This approach enables tasks such as style transfer, colorization and super-resolution, a technique that improves the resolution of an image.

The image-to-image technology encompasses a diverse set of applications in art, image engagement, data augmentation and computer vision, also known as machine vision. For instance, image-to-image translation allows photographers to change a daytime photo to a nighttime one, convert a satellite image into a map and enhance medical images to enable more accurate diagnoses.

Example of image-to-image translation changing the background of a stock photo. — Image-to-image translation makes changes to an image while maintaining its original properties, though the results aren't always perfect.

How does image-to-image translation work?

Image processing systems using image-to-image translation require the following basic steps:

Define image domains. The process begins by defining the image domains, which represent the types of input and output images the system will handle. These domains can include diverse categories such as style transfer, super-resolution and semantic segmentation.
Train the system. A data set containing paired examples of input and target images -- sometimes called ground truth target images -- is used to train the system so that it can learn the mapping that's required between the two domains.
Combine the generator and discriminator. Once trained, a GAN is used to combine generator and discriminator networks. The generator network takes in an input image from the source domain and generates an output image that belongs to the target domain. Meanwhile, the discriminator network learns to distinguish real images in the target domain as well as synthesized images produced by the generator. A loss function metric is used to measure the difference between the generated output and ground truth target image.

A critical aspect of image-to-image translation is ensuring the model generalizes well in response to previously unseen or unsupervised scenarios. Cycle consistency and unsupervised learning help to ensure that if an image is translated from one domain to another and then back, it returns to its original form. Deep learning architectures, such as U-Net and CNNs, are also commonly used because they can capture complex spatial relationships in images. In the training process, batch normalization and optimization algorithms are used to stabilize and expedite convergence.

This article is part of

What is GenAI? Generative AI explained

Which also includes:
9 top generative AI tool categories for 2026
Will AI replace jobs? 18 job types that might be affected
30 of the best large language models in 2026

Supervised vs. unsupervised image-to-image translation

The two main approaches to image-to-image translation are supervised and unsupervised learning.

Supervised learning

Supervised methods rely on paired training data, where each input image has a corresponding target image. Using this approach, the generated image system learns the direct mapping that's required between the two domains. However, obtaining paired data can be challenging and time-consuming, especially when dealing with complex image transformation.

Unsupervised learning

Unsupervised methods tackle the image-to-image translation problem without paired training examples. One prominent unsupervised approach is CycleGAN, which introduces the concept of cycle consistency. This involves two mappings: from the source domain to the target domain and vice versa. CycleGAN ensures the target domain is similar to the original source image.

For more information on generative AI-related terms, read the following articles:

What is an AI prompt engineer?

What is prompt engineering?

What is synthetic data?

What is LangChain?

What is multimodal AI?

AI models for image translation

Image-to-image translation and generative AI in general are touted for being cost-effective, but they're also criticized for lacking creativity. It's essential to research the various AI models that have been developed to handle image-to-image translation tasks, as each comes with its own unique benefits and drawbacks. Research groups such as Gartner also urge users and generative AI developers to look for trust and transparency when choosing and designing models.

Some of the most popular models include the following:

StarGAN. This is a scalable, single-model image translation approach, designed to perform image translation for multiple domains. Unlike traditional methods that require building separate models for each pair of image domains, StarGAN consolidates the translation process into a unified framework. This model introduces a novel architecture that can effectively learn mappings between different image domains, enabling versatile and efficient image translation.
CycleGAN. This is an unsupervised image-to-image translation model that has gained significant attention in the research community. It addresses the challenge of training data with unpaired images by using the concept of cycle consistency. By incorporating cycle consistency loss, which ensures the translated image can be mapped back to the original source image, CycleGAN achieves remarkable results in various image transformations without the need for paired examples.
Pix2Pix GAN. This GAN is a conditional generative model that learns a mapping from an input image and a noise vector to the output image instead of from random noise. This conditional approach enables more controlled and precise translations. The model uses a U-Net architecture, which combines an encoder and decoder network to capture detailed pixel-to-pixel features and enable high-quality image generation.
Unsupervised image-to-image translation (UNIT). The UNIT model focuses on unsupervised image translation and aims to learn mapping between different image domains without a paired training set of data. UNIT uses a U-Net autoencoder-like architecture and introduces a novel loss function that encourages the preservation of content representations during translation. This approach enables the model to generate visually appealing and semantically consistent images across different domains.

Image-to-image translation is a popular generative AI technology. Learn the eight biggest generative AI ethical concerns.

Continue Reading About What is image-to-image translation?

Generative models: VAEs, GANs, diffusion, transformers, NeRFs

CNN vs. GAN: How are they different?

GAN vs. transformer models: Comparing architectures and uses

Intersection of generative AI, cybersecurity and digital trust

How to prevent deepfakes in the era of generative AI

Dig Deeper on AI technologies

Search Business Analytics

Why ethical use of data is so important to enterprises
Enterprises that don't use data ethically have a lot to lose. To maintain their businesses' trustworthiness and value, executives...
Domo adds App Catalyst to platform to aid AI development
By combining natural language code generation with enterprise-grade security and governance, the vendor aims to help customers ...
The future of business intelligence: 10 top trends in 2026
Here are 10 key trends affecting the current state and future direction of BI initiatives that analytics leaders should be aware ...

Search CIO

Inside a CIO's mind: Mastering time and knowing the business
CIO Sean McCormack explains how he balances strategy, vendors and frontline engagement -- and why his to-do list lives on his ...
CIOs are feeling the pressure of the AI leadership gap
In this Q&A, Wendy Lynch, founder of Analytic Translator, discusses how CIOs need to close a leadership gap to overcome the huge ...
Why companies should be sustainable and how IT can help
Pressure is mounting for the business sector to address its environmental footprint and become more sustainable. Here's a look at...

Search Data Management

Databricks launches PostgreSQL Lakebase to aid AI developers
Resulting from the $1B acquisition of Neon, the database built for AI workloads -- including separate compute and storage -- is ...
Pentaho update aids data integration, semantic modeling
The vendor's latest platform update aims to speed, simplify and better govern workloads to help customers build a trusted ...
Snowflake launches new AI tools, unveils OpenAI partnership
New features such as an agent-powered code generator and automated semantic modeling simplify developing cutting-edge ...

Search ERP

Who's really governing enterprise systems: IT or leaders?
Across ERP, HR software and mobile platforms, governance decisions are being set earlier, often before organizations realize ...
C-suite should make AI data management the 2026 ERP priority
Aligning data lakehouses with those of ERP vendors and data partners is important, but it won't be enough without silo-busting ...
8 ERP security best practices for modern ERP environments
As supply chain attacks continue, ERP security requires strong authentication, regular patching, monitoring and incident response...

Close