Getty Images/iStockphoto
How to create an Azure Data Lake Storage Gen2 account
This tutorial details two different ways admins can set up an Azure Data Lake Storage Gen2 account, which will be a necessity when Gen1 isn't an option anymore.
Microsoft has announced it will retire Azure Data Lake Analytics and Azure Data Lake Storage Gen1 on Feb. 29, 2024. Anyone who uses Gen1 must migrate to Gen2 before the cutoff date.
It's important you know the necessary actions and potential challenges to create an Azure Data Lake Storage Gen2 account.
Create an Azure Data Lake Storage Gen2 account: Step by step
Admins can use two methods to migrate from Gen1 to Gen2.
Regardless of which method you use, you will need to create a new storage account before the migration process. You will also need to assign the Storage Blob Data Owner role to the account associated with your Gen2 account. You must assign the Owner Role to the Gen1 account as well.
Microsoft's preferred method involves simply copying data from Gen1 to Gen2. To perform this type of migration, begin by going to the overview screen for your Gen1 data lake storage account. Click on the Migrate Data button on this screen, and you are prompted to choose your preferred migration mode. Choose Copy Data to a New Gen2 Account, and then select the checkbox confirming that you accept Microsoft's service agreement. Click the Apply button and the data is copied.
The migration process causes the Gen1 account to become read-only, at least for the duration of the migration. Similarly, the Gen2 account is disabled until the migration completes.
The other option to create an Azure Data Lake Storage Gen2 account is to perform what Microsoft calls a "complete migration." The initial steps of the migration process are identical to a copy migration. Begin by going to the overview page for your Gen1 account and clicking the Migrate Data button. Rather than choosing the option to copy data to a new Gen2 account, however, you must select the Complete Migration to a New Gen2 Account option. Once again, you must select the confirmation checkbox and click the Apply button.
Microsoft recommends a copy data migration over a complete migration because it temporarily disables the Gen1 account for the duration of the migration only. Conversely, a complete migration permanently disables the Gen1 account and deletes it after 30 days.
You will probably need to perform some cleanup tasks afterward. This may include pointing services to the Gen2 endpoint, updating your applications to use Gen2 APIs, reformatting URIs to use Gen2 naming conventions and updating any applicable scripts.
Challenges to setup
Three main challenges tend to derail the process for users to create Azure Data Lake Storage Gen2 accounts.
First, you must migrate all your Azure Data Lake Analytics accounts to Azure Synapse Analytics -- or another supported platform if available -- according to Microsoft. Otherwise, you will receive an error message that Azure was unable to initialize the migration process.
You must use a new storage account, even if it's empty. The account you create must have the hierarchical namespace feature enabled.
In addition, certain file and directory names that were valid in Gen1 are unsupported in Gen2. This includes names made up of all spaces or tabs, names with multiple forward slashes and names that end with a period or contain a colon. You must rename any file or folder with one of these unsupported names before you can perform a migration.
How setup and administration compare to Gen1
A Gen2 data lake will use containers, even if your Gen1 environment did not, because the container structure is used in the migration process. Azure will automatically create a container called Gen1. The Gen1 container will act as a repository for the migrated data. You can't rename the Gen1 container, but once the migration process is over, you can create a new container and move data as needed.
Once you finish with the migration process, you can enable geographically redundant storage but only if you aren't planning on using the application compatibility layer. Using geographically redundant storage will cause the application compatibility layer to fail.
Finally, the migration process only migrates data, not settings. As such, you will need to manually reconfigure any applicable settings, such as encryption and firewall use.