imtmphoto - Fotolia
How a rules engine can drive -- or derail -- an IT automation strategy
To benefit fully from a rules engine -- and minimize the risk of chaos -- IT organizations need to carefully define system metrics and be diligent about documentation.
Rules engines are not new to IT operations -- but they've evolved to automate large infrastructure-as-code deployments, and create cloud resources, without human involvement.
Despite the benefits of a rules engine, it can also pose a risk to an organization's IT automation strategy without proper management practices in place.
What is a rules engine?
A rules engine is a somewhat broad term, referring to a system that initiates specific actions based on a certain set of circumstances or conditions. Rules engines can apply to business policies and business process management, but also play a role in IT.
Typically, the engineering team would create playbooks to provide IT operations personnel with instructions to address an array of performance issues or events. However, as infrastructure became increasingly software-based, playbooks evolved into automated rules engines that can now interact with multiple systems. This enables operations admins to respond more quickly to events or orchestrate domino-effect changes without human involvement. A key benefit of rules engines is the ability to create triggers -- or rules -- to instate changes more quickly and accurately than a human administrator could do.
An admin can set a rules engine with any desired metrics and, when those metrics are met, have the system perform an action. Autoscaling in the cloud is a perfect example of how a rules engine works: Workload demand increases and an application or service needs to scale; a rules engine has an established performance value trigger, and when that value is reached, it spins up additional resources to address the demand. A rules engine can also scale resources back, which helps to control costs.
Rules engines are necessary to keep up with the rapid pace of IT and business activity. Administrators, however, have to carefully define metrics, and manage the actions taken on IT systems, to use rules engines effectively.
Beware the dangers of rules in IT automation
Metric values vary for a number of reasons that IT monitoring tools can't always detect. A rules engine only executes based on the value, without any flexibility or context. While rules engines are quite efficient, this lack of context poses significant risk. What's more, a faulty condition, such as a resource leak, can trigger an action that uses additional resources, but doesn't actually correct the issue -- or perhaps causes a bigger issue down the line. Rules engines do not diagnose problems or possess self-healing abilities.
In a perfect world, an IT team can set most types of rules quickly and, once in place, they mostly run on autopilot. But what happens with a distributed denial of service (DDoS) attack? Without limitations or additional conditions in place, a rules engine might spin up unlimited resources in the cloud to deal with a DDoS attack in the middle of the night when nobody is around to hit a kill switch -- which could lead to a massive bill from a cloud provider.
Create additional conditions or rules to prevent an existing rule, or set of rules, from spinning out of control in an unplanned or unforeseen event. These additional rules can limit how many total resources are allocated in a given time frame, or where resources are available. Apply these additional rules to the original rules that are designed to create resources.
This creates a collection of interacting rules that can limit or enable each other -- which, without careful management, can create a confusing spider web of rule sets that you can't untangle, much like a pile of cables hidden in a data center closet.
Often, IT admins create these additional conditions or rules after a catastrophe, rather than plan for them initially. The rules engine already did something it wasn't supposed to, and now additional rules and policies are rushed into place to prevent a recurrence. However, while this addresses the immediate issue, there's no guarantee that this new set of intertwined rules won't cause a brand-new catastrophe somewhere down the line.
Don't discount the rules engine yet
Documentation is key with a rules engine. Be sure to document and update all rules engines -- whether playbooks or automated software -- or they will become your organization's greatest liability. If admins cannot understand why a rule occurs, or only that 'X' happens based on a certain action, they lose the ability to make adjustments or changes based on business needs.
Business and IT priorities change. It's not enough to understand a process once; admins must be able to work with and adjust the rules engine as needed. Over time, IT teams might need to adjust both the action and conditions that govern a particular rule. Understand the effects of making those changes, ranging from costs and resource allocation to customer experience. A rules engine is only as good as the actions it commits. While IT organizations give up some control to the engine, they must closely supervise it.