Tip

How to tackle an email archive migration for Exchange Online

A move from on-premises Exchange to Office 365 also entails determining the best way to transfer legacy archives. This tutorial can help ease migration complications.

A move to Office 365 seems straightforward enough until project planners broach the topic of the email archive migration.

Not all organizations keep all their email inside their messaging platform. Many organizations that archive messages also keep a copy in a journal that is archived away from user reach for legal reasons.

The vast majority of legacy archive migrations to Office 365 require third-party tools and must follow a fairly standardized process to complete the job quickly and with minimal expense. Administrators should migrate mailboxes to Office 365 first and then the archive for the fastest way to gain benefits from Office 365 before the archive reingestion completes.

An archive product typically scans mailboxes for older items and moves those to longer term, cheaper storage that is indexed and deduplicated. The original item typically gets replaced with a small part of the message, known as a stub or shortcut. The user can find the email in their inbox and, when they open the message, an add-in retrieves the full content from the archive.

Options for archived email migration to Office 365

The native tools to migrate mailboxes to Office 365 cannot handle an email archive migration. When admins transfer legacy archive data for mailboxes, they usually consider the following three approaches:

  1. Export the data to PST archives and import it into user mailboxes in Office 365.
  2. Reingest the archive data into the on-premises Exchange mailbox and then migrate the mailbox to Office 365.
  3. Migrate the Exchange mailbox to Office 365 first, then perform the email archive migration to put the data into the Office 365 mailbox.

Option 1 is not usually practical because it takes a lot of manual effort to export data to PST files. The stubs remain in the user's mailbox and add clutter.

Option 2 also requires a lot of labor-intensive work and uses a lot of space on the Exchange Server infrastructure to support reingestion.

That leaves the third option as the most practical approach, which we'll explore in a little more detail.

Migrate the mailbox to Exchange Online

When you move a mailbox to Office 365, it migrates along with the stubs that relate to the data in the legacy archive. The legacy archive will no longer archive the mailbox, but users can access their archived items. Because the stubs usually contain a URL path to the legacy archive item, there is no dependency on Exchange to view the archived message.

Some products that add buttons to restore the individual message into the mailbox will not work; the legacy archive product won't know where Office 365 is without further configuration. This step is not usually necessary because the next stage is to migrate that data into Office 365.

How Office 365 handles archiving and journaling

Microsoft designed Exchange Online with the same functionality as legacy archive products, but it works in a different way.

The Office 365 online archive is not like an Enterprise Vault archive. Rather than a seamless extension of the user's primary mailbox, the Office 365 online archive is a second mailbox accessed via Outlook on the web or the full Outlook app. Retention policies automatically move items from the primary mailbox to the online archive mailbox, and the migrated data retains the same folder structure.

Microsoft offers unlimited archives that automatically expand as needed. Because the primary mailbox size is 100 GB and the admin can configure the Outlook client to cache just a small amount offline, many users will not need an archive.

Exchange Online does not support journal mailboxes. If you want a third-party journaling service, you can set an external recipient. Hold functionality -- available as Litigation Hold and In-Place Hold policies in Exchange Online or via retention policies in the Security and Compliance Center in Office 365 -- gives you the ability to configure policies to keep data as long as the organization requires.

When applying a policy to a mailbox, Office 365 keeps the original even if the user makes a change or deletes the message. Office 365 stores the message in the Recoverable Items folder within the user's mailbox. Office 365 keeps the message even when you delete the user's account and the license is repurposed. The mailbox data remains in an inactive mailbox for eDiscovery purposes.

You should migrate legacy archive data extracted from user mailboxes back to the mailbox in Exchange Online. You should also migrate journal data to each respective user's Recoverable Items folder. This way, users get access to all the data they currently have and Office 365 tools can discover journal data from the time prior to migration and after migration.

Transfer archived data

Legacy archive solutions usually have a variety of policies for what happens with the archived data. You might configure the system to keep the stubs for a year but make archive data accessible via a web portal for much longer.

There are instances when you might want to replace the stub with the real message. There might be data that is not in the user's mailbox as a stub but that users want on occasion.

We need tools that not only automate the data migration, but also understand these differences and can act accordingly.

We need tools that not only automate the data migration, but also understand these differences and can act accordingly. The legacy archive migration software should examine the data within the archive and then run batch jobs to replace stubs with the full messages. In this case, you can use the Exchange Online archive as a destination for archived data that no longer has a stub.

Email archive migration software connects via the vendor API. The software assesses the items and then exports them into a common temporary format -- such as an EML file -- on a staging server before connecting to Office 365 over a protocol such as Exchange Web Services. The migration software then examines the mailbox and replaces the stub with the full message.

migration dashboard
An example of a third-party product's dashboard detailing the migration progress of a legacy archive into Office 365.

Migrate journal data

With journal data, the most accepted approach is to migrate the data into the hidden recoverable items folder of each mailbox related to the journaled item. The end result is similar to using Office 365 from the day the journal began, and eDiscovery works as expected when following Microsoft guidance.

For this migration, the software scans the journal and creates a database of the journal messages. The application then maps each journal message to its mailbox. This process can be quite extensive; for example, an email sent to 1,000 people will map to 1,000 mailboxes.

After this stage, the software copies each message to the recoverable items folder of each mailbox. While this is a complicated procedure, it's alleviated by software that automates the job.

Legacy archive migration offerings

There are many products tailored for an email archive migration. Each has its own benefits and drawbacks. I won't recommend a specific offering, but I will mention two that can migrate more than 1 TB a day, which is a good benchmark for large-scale migrations. They also support chain of custody, which audits the transfer of all data

TransVault has the most connectors to legacy archive products. Almost all the migration offerings support Enterprise Vault, but if you use a product that is less common, then it is likely that TransVault can move it. The TransVault product accesses source data either via an archive product's APIs or directly to the stored data. TransVault's service installs within Azure or on premises.

Quadrotech Archive Shuttle fits in alongside a number of other products suited to Office 365 migrations and management. Its workflow-based process automates the migration. Archive Shuttle handles fewer archive sources, but it does support Enterprise Vault. Archive Shuttle accesses source data via API and agent machines with control from either an on-premises Archive Shuttle instance or, as is more typical, the cloud version of the product.

Dig Deeper on Microsoft messaging and collaboration