Apply AI to container monitoring and management -- piece by piece
While the application of AI to container monitoring and management is possible, it first requires certain operations practices and carefully integrated tool sets.
With every advance in technology, there are vendors that rebrand their tools to align with the latest trend. In the IT monitoring and management market, AI washing -- the claim that AI is included in products that don't really use the technology -- is rampant.
In a recent CIMI Corporation survey, nearly 100% of enterprises said they believed that most AI claims made by software products were false. Operations teams who want an AI-driven IT tool -- including those for container monitoring and management -- must seek out specific signs that their vendors' claims are true.
What defines AI in IT?
AI applies technology -- usually, a form of machine learning -- to enable an IT system or tool to make decisions based on inference, as well as conduct self-learning processes to improve those decisions. AI, in short, must augment or replace human judgment in some concrete way -- not just replicate it.
IT admins must recognize the links between container monitoring, which is the gathering of information about container and containerized application behavior, and container management, which is the task automation relating to container deployment, redeployment and scaling. If AI supplements or replaces human judgment, it must provide a link between conditions and operations processes.
One of the most common forms of AI washing is to rename an analytics tool an AI tool. Statistical analytics identify patterns in data that could be significant. If an analytics tool works with a specific and predefined data pattern, such as the relationship between events in a monitoring and management system, a human expert has necessarily taught the tool the significance of that pattern. However, if the tool correlates information with no predefined patterns to identify and presents the results for human assessment, it's crossed the threshold into AI.
Distinguishing markers
Information collection is not AI. Looking up an application in a database to retrieve its configuration information or performance indicators isn't AI either. Any AI application useful to container monitoring and management operates between the distinct monitoring and management functions, performing analysis to determine certain steps to take, such as to scale a deployment or redeploy a failed element.
The application of AI to IT monitoring, in general, facilitates fault correlations -- the determination that a series of problems relates back to a single common fault, which should be the target of remediation. This is a major requirement for AI in container monitoring.
The challenge is to link correlations to a container management process: The faults a generic AI-based monitoring tool determines might not be in scope for containers at all. If a network device fails and disconnects a data center, a container management system probably won't address that issue.
Hurdles to AI for container management
Containers offer a standardized deployment model that facilitates the automated orchestration of deployment and redeployment processes. Containers' utility depends on the abstraction of resources that host and connect them.
A container management system expects to work within that framework and thus tends to ignore the abstracted hosting and connecting resources, which makes the direct application of AI via a single tool, at least at this stage, a challenge. Additionally, ops teams typically offload container scaling to one of several service mesh tools, which are both separate from the container orchestration process and outside the scope of container management systems.
This is why an online search for container management and AI will largely return results on how to use containers to host AI workloads, not how to use AI for container management.
Options for AI-driven container management
Without specific AI tools to manage a container application lifecycle, IT organizations must identify the necessary pieces of one. The starting point should be a comprehensive container monitoring tool that works at scale, supports a variety of orchestration frameworks and provides the ability to spawn tasks based on conditions. Ceilometer and Monasca combine into a basic tool set called Ceilosca to provide monitoring for containers and underlying VM hosts, either on premises or in the cloud. The combination exemplifies how generalized AI is applied to containers, without relying on specific AI features in either monitoring or management tools.
The Ceilosca model is based on a series of agents that deploy on bare metal, VMs or containers. While IT admins can also deploy agents on network management systems, it requires some degree of customization. Agents feed an event bus that collects events -- called a Kafka stream in Apache Kafka -- and a Monasca publisher posts them through event processing tools to a timestamped database. AI tools -- those that use either machine learning or a neural network -- access that database to draw inferences or take actions. These AI elements can then invoke operations processes, including container redeployment and scaling. IT organizations can also use AI for event collection and correlation, provided that the events flow through the same path and are stored in the same database, as well as in a compatible format.
The Monasca system doesn't directly support container orchestration control. Similar tools that do support it include SNAP, Sensu and Telegraf. Admins can also use Prometheus node exporter and related tools to assemble a framework. Each of these tools provides a comparable on-ramp to true AI in container monitoring and management, and all integrate with Apache Kafka to generate Kafka streams.
Tools such Deeplearning4j, H2O and TensorFlow help integrate an AI-driven monitoring framework directly with Kafka streams of events. This combination of AI or machine learning and Kafka streams is already popular with the large social media and video platforms, as well as online service providers. These are capabilities that IT organizations should watch for -- or demand -- when they evaluate AI features for container management and monitoring.