putilov_denis - stock.adobe.com
AI in operations management relieves pressure on IT teams
AI, when combined with IT operations and DevOps teams, forms AIOps that can greatly improve how IT assets are developed, produced and managed.
Just as tools emerge for IT to better manage their operations, so too are tools and techniques that help developers manage the process of getting AI applications from development into production and deployment.
In the late 2000s, DevOps tools emerged that provided aspects of IT operations combined with developer-oriented activities to enable faster iteration on development activities, improve quality of code and deliver more consistent testing of code. In much the same way that modern organizations have come to depend on ITOps to keep their IT organizations running smoothly, so too have development shops come to depend on DevOps to do the same. However, AI is planning to further revolutionize both critical aspects of IT-relevant operations with the creation of AI operations (AIOps).
AI for ITOps: AIOps
IT organizations consist of a wide range of services, technologies and procedures. Besides keeping systems running smoothly, ITOps also plans for continual change, deals with both external and internal threats to security, and upgrades systems as more powerful options become available. Doing all this without causing unnecessary outages is a significant challenge for many IT organizations.
IT staff and decision-makers are overwhelmed with work demands. As the work increases, the ability to get ahead of problems becomes harder, and so short-term problems take center stage. As a result, the IT team can't sufficiently plan ahead or upgrade to systems that could reduce or eliminate those problems.
A batch of AIOps options have begun to emerge in order to address these challenges. AIOps can help automate and enhance ITOps by using a range of machine learning algorithms. From chatbots that can provide first-line support to users looking for IT help to powerful analytics, machine learning-powered AIOps can decrease the burden on IT departments.
AIOps and its use cases
AIOps relies on big data collected across the organization and from various IT devices to automatically spot and react to issues in real time and provide deep analytics. AIOps can address a wide range of ITOps functions, including availability and performance monitoring, event correlation and analysis, and IT service management and automation.
Specifically, AIOps can collect a wide range of data from disparate sources, including log files, alerts, performance metrics, dashboards and databases, among other sources. It can correlate this data and identify patterns or anomalies that should be further evaluated or provide predictive analytics to help with identifying potential faults or failures in systems.
Along with the flexibility needed to find and fix problems faster, AIOps can also provide IT teams with predictive insights to prevent the issues from happening. Practical use cases of AIOps technology include capacity planning, taming cloud sprawl and the service degradation model, management capacity, design capacity and fixing performance challenges.
These AI-enabled systems are helping with proactive monitoring by keeping an eye on a wide range of metrics to monitor network infrastructure, traffic flow, congestion and potential security issues. And these more intelligent monitoring tools can also be used for capacity planning by deploying machine learning to gain insights into overall usage patterns.
These AIOps systems can provide further value by extracting root cause analysis for failures. More advanced AI systems can even help the networks heal themselves by automatically adjusting routing, scaling server and infrastructure capabilities up or down as needed, and dealing with changes autonomously.
DevOps and machine learning: MLOps
In much the same way that AIOps is pushing ITOps to a new level, AI is significantly impacting DevOps. Machine learning is improving existing DevOps capabilities, as well as introducing new flows for managing machine learning lifecycles.
Machine learning tools are helping to automatically spot and identify potential bugs and integration issues prior to overall code integration. AI-enabled DevOps tools are also optimizing Agile-based processes by helping to inform how Agile sprints should be organized and by giving additional analytics insight to project and program managers.
The bigger, more impactful side of this technology is the use of DevOps practices for managing the machine learning lifecycle. According to a recent report by research firm Cognilytica, existing DevOps approaches to machine learning are challenging because machine learning models differ from traditional application development. Machine learning models are primarily data-driven, which makes managing the model lifecycle more about managing the data lifecycle than managing code development and deployment. Simply building a model and pushing it into operation are not enough to guarantee that the model will work.
Likewise, even a model that uses the best training data will perform poorly in an environment where the real-world data doesn't match up to how the model was trained. As such, the emerging area of machine learning operations (MLOps) focuses on machine learning-specific lifecycle needs.
MLOps use cases and focus
MLOps tools focus on five primary things:
- managing training data and supporting data needed to effectively create machine learning models;
- handling the multiple iterations and versions of the machine learning models as they are operationalized;
- monitoring and managing models as they are being used in the real-world environment;
- addressing a wide range of model governance and access requirements; and
- handling model security needs to protect both machine learning models and their training data from tampering.
These MLOps systems further provide support for model versioning and iteration by putting different versions of models into production. They are also built to support multiple concurrent model versions, inform model users of version changes, increase the visibility into the model version history, and retire models that are no longer providing value or being used.
MLOps tools help monitor and manage existing models to keep a constant eye on model performance, identifying how the models are no longer matching real-world data or how real-world data differs from the data used for training. These tools can even help provide data provenance that can track how specific data used in training is matching up to the real world.
Advanced MLOps tools even provide a model registry or catalog where model users can identify potential models for use in different scenarios. The emerging batch of MLOps tools also provide model governance, keeping an eye on the increasing requirements for auditing, compliance, governance and access control.
MLOps tools can provide means for model access control, visibility into various measures of model transparency, help to enforce model training and retraining pipelines, and various regulatory or compliance needs for model usage. MLOps tools also are increasingly providing functionality to help secure models and their dependent data by preventing attacks on the models and data and providing visibility into overall model vulnerability.
While the "Ops" universe seems crowded and increasingly more diverse, it is clear that both traditional ITOps and DevOps, as well as the newer AIOps and MLOps tools and practices, are greatly impacting the ways that IT assets are developed, produced and managed. The future for operations is no doubt more automated, more intelligent and more resilient than ever before.