Getty Images
Kubernetes cost management approaches to save money
Kubernetes and container costs continue to rise as services move off-site. Learn how to manage the costs with limited resources, autoscaling and the right cloud instances.
Outside of the cloud, few buzzwords have hit companies more than Kubernetes. While most companies struggle to understand what it is and what it can do for them, even fewer can spell it correctly. But they all have one thing in common: They need it for something.
Containers and Kubernetes have a place in IT and in many businesses, but they fit specific needs. So, if Kubernetes does meet your IT organization's needs, it's important to manage it properly to avoid much bigger, unexpected issues.
Manage expectations
One of the first things to remember about any automation or process is that just because you can do something doesn't mean you should.
One of the biggest drivers of cloud-related technology is automation and scaling to meet demand. The management of cloud, Kubernetes and containers is a combination of resource allocation, autoscaling and the right cloud instances. Management is not about the automation aspect -- that is part of the tool set -- but it's also not how IT teams manage Kubernetes. Instead, to manage Kubernetes costs, control resource use because that is tied to tangible, measurable costs.
Cloud providers enable users to deploy containers in several different modes, from running on dedicated instances to using internal services. Which mode is up to the IT ops team, but in any deployment, they must balance cost and control, which requires limiting resources and right-sizing to start.
Right-size environments
A right-size environment is one in which the number and types of resources available are suitable for whatever roadmap an IT organization is addressing with cloud, Kubernetes or containers. This setup requires careful resource management to control what is in use -- and what is available on standby for scaling and failover purposes. Start with right-sizing the environment.
To prepare a single environment for the worst possible spike ignores the opportunity to establish autoscaling capabilities. However, that doesn't mean IT teams should choose the smallest possible platform, either. Allocation of large-scale resources costs money for unused capacity -- but sizing too small can create additional overhead, as well as delays and complexity in an application.
Right-sizing an environment goes hand in hand with density and can be more of an art form than a strict set of numbers or rules. IT teams must balance all aspects of performance and complexity with costs. Always keep an eye on the potential reach of a failure because errors will occur, and too much density has a big effect.
Limit resources
Limiting resources is an interesting aspect of management because people tend to think all things that are automated or not connected directly to hardware are free. Kubernetes doesn't improve this situation and can worsen it unintentionally.
With large-scale automation, the question is: If you can do something, should you do it? If IT teams push to the limits of their deployed platform of choice, then it becomes a trade-off between cost versus resource availability.
If everything relates -- to some degree -- around a central point, that facilitates autoscaling. Kubernetes is designed to scale out based on demand when key metrics are met, such as need or lack of resource.
Depending on the application design, it is possible to scale out to global supply quickly and efficiently. This process is managed and automated for a worry-free environment -- except for one minor issue: the bill.
Autoscaling has an open downside. As the application deployment grows, the bill grows with it. While this is not a surprise, it's the scale at which it can happen and where limiting resources and correct instances come into play.
One downside to automated services is that, without direct human oversight, events occur quickly -- and, even when monitoring tools and alerts kick in, there is a lag time. Third-party monitoring tools, such as Prometheus, Elastic Stack or Grafana help, but come with limitations.
These are insight and monitoring dashboards; they are command and control to prevent bad management. They can tell admins that it has occurred, but by then, it's too late. This doesn't mean the insight they provide is bad, but these tools are not preventive -- and might not even be predictive, depending on workload types.
Runaway bills are becoming more common as services move outside the data center. Containers and Kubernetes are not to blame for the next wave of excessive billing -- they are simply the means and tools that generate what is going to get billed. The control aspect comes from how IT organizations choose to manage these new tools.
Automation will always move faster than we can adjust or react to, so the effort and thought must be done upfront. Create a plan and set of rules, or you might lose control -- and pay for it for months.