Getty Images/iStockphoto
How do you debug a Kubernetes service deployment?
To debug a Kubernetes deployment, IT teams must start by following the basic rules of troubleshooting and then move to the smaller details to find the root cause of the problem.
Kubernetes is complicated, with many components and variables that make it difficult to understand how, why and when something goes wrong.
Before panicking, remember the fundamental rules of troubleshooting:
- Use historical data, such as logs, and observation to identify the root cause of a problem.
- When ascertaining the cause or trying a fix, change only one variable at a time.
- Before trusting a fix, confirm that it works under different conditions.
Understanding Kubernetes' many components and administrative commands is critical to execute the first rule successfully and to debug Kubernetes application and service deployments.
Survey the landscape
Kubectl is the primary administrative tool for Kubernetes clusters and includes more than 30 commands. The command kubectl get reveals basic information about a particular resource. For example, kubectl get pods lists the available pods and their status, while kubectl get services lists the applications running as a network service on a set of pods.
However, a more detailed option exists for troubleshooting: kubectl describes pods, also used as kubectl describes pod (TYPE/NAME), provides detail about container or pod labels, resource requirements, state, readiness, restart count and events.
For example, an admin finds that get pods for an Nginx application shows a particular pod isn't running. Using describe pod nginx-app1-1370807587 shows the admin that the pod couldn't be scheduled, due to inadequate available resources. The following is a subset of returned output that demonstrates the problem:
Name: nginx-app1-1370807587
Namespace: default
Node: /
Labels: app=nginx,pod-template-hash=1370807587
Status: Pending
ume populated by a Secret)
SecretName: default-token-4bcbi
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
1m 48s 7 {default-scheduler} Warning
FailedScheduling pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node
fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000
Inadequate resources are only one possible reason a pod might not work. Others include one or more downed nodes or connectivity problems with the internal or external load balancers.
Check the service status
In Kubernetes, externally accessible applications are exposed as a service to define a logical set of pods and access controls. Services are specified by a name and the targetPort attribute. The Kubernetes controller assigns the service a cluster IP address accessible by Ingress proxies. Administrators can encounter a problem when deploying a Kubernetes application by creating the pod but not the service. Verify the service status with the following commands:
wget -O- hostnames
kubectl get svc hostnames
Either wget or kubectl could return an error like the following:
Resolving hostnames (hostnames)... failed: Name or service not known.
Or they could return this:
No resources found.
Error from server (NotFound): services "hostnames" not found
These responses indicate that Kubernetes has not created the service. To fix this problem, create and validate the service with the following commands. In this example, the service port number is 9090.
kubectl expose deployment hostnames --port=80 --target-port=9090
> service/hostnames exposed
kubectl get svc hostname
Application deployments in Kubernetes
The way in which Kubernetes exposes applications as outside services can confound troubleshooting. There are several abstraction layers separated by network load balancers, such as the following:
- Nodes host the containers used within a pod. As detailed by the Kubernetes documentation, containers within a pod communicate via the node's loopback interface -- 127.0.0.1 -- while pods in a cluster communicate via Kubernetes' internal cluster networking.
- Services are network resources that expose an application running in pods to users and other services outside the cluster.
- Ingress provides application layer mapping via HTTPS to services for external users. Ingress controllers segregate traffic to different services using name-based virtual hosting. Open source Kubernetes supports AWS Application Load Balancer, the Ingress controller for Google Cloud and Nginx Ingress controllers. Third parties support the Envoy proxy and Istio service mesh.
Network and service configurations are often why a Kubernetes application is unreachable. This could be due to the service interface not pointing to a pod and port. To create a service, use the kubectl expose command. For example, to create and check a service configuration for an Nginx application, use the following:
kubectl expose deployment/nginx-app1
kubectl describe svc nginx-app1
The output of the describe command in this example is the following:
Name: nginx-app1
Namespace: default
Labels: run=nginx-app1
Annotations: <none>
Selector: run=nginx-app1
Type: ClusterIP
IP: 10.0.162.149
Port: <unset> 80/TCP
Endpoints: 10.244.2.5:80,10.244.3.4:80
Session Affinity: None
Events: <none>
A common error is not matching a service's targetPort attribute with the port that the container uses in the pod, specified as containerPort. A similar problem occurs if the service port isn't configured properly in the Ingress controller.
Review event logs
Kubernetes components, such as kube-apiserver, kube-scheduler, kube-proxy and kubelet, as well as the containers, nodes and applications that run on Kubernetes, each generate logs. Log information should be archived for review for use in troubleshooting.
Klog is the logging library in Kubernetes that generates messages for its system components. Many tools collect information on and review Kubernetes events, such as kubewatch, Eventrouter and Event Exporter. These Kubernetes watchers can work in concert with log analysis software, like Grafana or Kibana.
Sloop, an open source project developed by Salesforce, visualizes event histories and resource state changes. It can be used to debug Kubernetes deployments. Features of Sloop include the following:
- The ability to inspect resources that no longer exist. Unlike Kubernetes' one-hour default retention period, Sloop retains events indefinitely, so admins can see deleted or replaced pods.
- A timeline visualization that displays rollouts of related resources in updates to Deployments, ReplicaSets and StatefulSets. This feature helps identify intermittent errors.
- A self-contained service that works with any storage choice.
Debugging Kubernetes deployments -- with their many abstraction layers and configuration parameters -- requires a considered approach. Start with the basics, and work up the software stack:
- Ensure that all pods are running. Check if any fail to deploy due to resource scarcity.
- Analyze the internal traffic flow to verify that service requests make it to the correct pod.
- Test external traffic flow through a network load balancer and Ingress proxy to validate the network configuration.
- Review the event history with a visualization tool, like Sloop or Grafana, to spot anomalies and correct unexpected events. These steps help administrators to ascertain the root cause of a service failure.