canary testing
What is canary testing?
In software testing, canary testing refers to testing a new software version or a new feature with real users in a live (production) environment. It is done by pushing some code changes live to a small group of end users who are usually unaware that they are receiving new code.
Since the new code or "canary" is only distributed to a few users, its eventual impact is relatively small. Moreover, the changes can be quickly reversed should the new code prove to be buggy or cause problems for users. Canary testing is also known as "canary deployment" or "canary release."
Understanding canary testing
The word canary describes the rollout of software code to a subset of real end users. The term originated from coal mining and the phrase "canary in the coal mine." Canary birds have a lower tolerance to toxic gases than humans, so they were used to alert miners when these gases reached dangerous levels inside the mine.
In program canary testing, a small group of end users serves as a test group to receive new code. Like the canary in a coal mine, these users are unaware that they are playing a role to provide an early warning of problems in the application. If a code change causes problems, monitoring software alerts the dev team so they can fix the code before it is released to a larger group of users, thus mitigating the risk of degrading the experience for all users.
Why canary testing is effective
A canary release is a good way to roll out incremental changes to code related to the addition of new features or the creation of a new version of the software. Because the code is released in production to actual users, it enables the development team to quickly evaluate whether the changes provide the desired or expected outcomes.
Canary deployment also allows developers to migrate a small subset of users to new functionality in a new release. By exposing only a fraction of the overall user base to the new code, the effects of potential issues related to the new software are minimized while making it easier for developers to roll back a buggy release and prevent it from affecting the entire user base.
How canary testing works
Like other types of software testing, canary testing follows a systematic, step-by-step process. The steps are as follows:
Step 1. The development team selects the users who will be testers. This group is a small subset of the user base but a large enough group to produce results that will allow a meaningful statistical analysis. The users are unaware that they are part of the testing group.
Step 2. The team sets up a testing environment that operates in parallel with the existing live environment. They also configure the system load balancer to route user requests from designated canary testers to the new environment.
Step 3. Developers initiate the canary test by routing test user requests to the new environment. They also monitor testers for as long as it takes to ensure that the new version is operating as expected.
Step 4. If the new version meets predefined deployment criteria, the new software feature or version can be released to all users. However, if the new version is found to contain a lot of bugs, make the application perform poorly or create some other issue for users, the testers are rerouted to the original version of the software.
Step 5. The team fixes the discovered bugs and then releases the software to the wider audience.
Planning a canary test deployment
Planning a canary deployment involves considering multiple parameters, including the following:
Number of users and stages. The total number of users who will receive the canary deployment and in how many stages are important factors to consider when planning a canary test. It is common to route the canary test code to 5% or 10% of the total user base. In some cases, dev teams also select the test user group based on a particular geographic region.
Timing/duration. Canary deployments can run for a few minutes or several hours, depending on the application and what is being tested. It can also take time to update the application for the testers and to analyze and publish the test results. It's important to consider the time required for all aspects of the tests when planning a canary deployment.
Evaluation criteria. Like any other type of software testing, the success or failure of a canary test can only be determined if the evaluation criteria have been previously defined.
Performance metrics. Metrics must be collected to analyze test progress, assess application performance (e.g., latency), determine CPU and memory utilization, and track errors. This information directly impacts the evaluation and success of a canary deployment.
Implementing a canary test deployment
Once the planning is complete, the development team does the actual deployment of the canary test by routing the new code to the selected test group. The dev team will prepare deployment manifests and configuration files, build artifacts and construct testing scripts.
Next, the team will create a canary node through a process called load balancing and clone the actual production environment. At least two production environments are required for canary testing, one of which will be the original application without the code changes (baseline).
Analyzing a canary test deployment
Routing the canary code to the selected user base will result in traffic to both the baseline and the test nodes. This will enable the development team to run a comparison and determine if the test version is performing as expected, depending on the evaluation criteria identified during the planning stage. Log analytics will reveal if there are bottlenecks or bugs that must be fixed before a wider release.
Benefits of canary testing
Canary testing makes it easy to validate new software or a new feature in an existing application. The code performance can be monitored closely before it is released to a larger user base. Since the canary is only deployed to a small number of users, it significantly reduces the risk of widespread poor performance or poor user experiences. Further, changes or features can be quickly revoked if it is discovered that they degrade the application's performance, contain bugs or generate negative user feedback.
A canary test is a good way to set up a beta program with selective real users to collect their valuable feedback before a major code release. In some cases, the QA team may be the first group to test the new features. They can do the testing in the same environment as end users to find bugs occurring in production that may not have been encountered or identified in staging.
Challenges with canary testing
While canary testing offers multiple benefits for development teams, it also comes with certain challenges. For one, it's difficult to perform canary testing for mobile apps since there is only one environment: a user's personal computing device. One way to counter this limitation is with feature flags to enable a feature remotely only for a small group of users. It's also useful to encourage end users to turn on application auto-update so they get regular updates that allow developers to easily roll out a canary test or an updated release.
Canary testing can also be challenging when multiple new features are released rapidly. To test all these features, multiple environments would be required, which can become hard to manage. Feature flags are again a good way to overcome this problem.
A canary deployment often involves multiple production machines, which adds to the complexity. Migrating users and monitoring the new system can also be complicated and time-consuming tasks if done manually. Automation can reduce the complexity and time required for analysis and bug fixes.
Automated testing tools also make it easier to create new tests, identify test users and analyze test results. Development teams can also create predefined test cases to perform targeted testing and back up the results of the test.
Other methods to test software changes
Canary deployment is just one technique to update software that's currently in use in a production environment. Some other methods for deploying software updates include the following:
Basic deployment strategy. Also known as the "re-create" approach, this strategy updates all systems with the new software at the same time. It is the simplest deployment strategy for software development teams but also the riskiest. If the upgrade is flawed, the entire user base will be affected. Remediating a buggy release can be disruptive, especially if the bugs are not discovered until after all users have been updated.
Incremental or rolling update strategy. This is similar to the basic deployment strategy, except it partitions the user base into smaller groups that receive the update in phases. It enables the deployment team to halt a release before it affects the entire user base in case of issues.
Blue/green deployment strategy. Blue/green deployment involves deploying the new version of the software alongside the old version. It requires double the resources of the incremental deployment strategy and may result in downtime, unlike canary testing.
However, this method also reduces the effects of a buggy release because users can be switched back to the unaltered release fairly quickly. It is also a good choice when the code has been tested thoroughly and the expected risk of failure is considered low.
A/B testing deployment. This method is used to test specific features by rolling them out to specific users based on particular metrics. A/B testing metrics may reflect specific types of use, localization, browser, screen size or operating system.
Organizations can mix and match these and other approaches to optimize their continuous delivery and continuous integration (CI/CD) pipeline.
Canary deployment is important, but it's not the only good choice for managing deployment strategies. Learn more about when to use a canary vs. blue/green or rolling deployment.