How Kubernetes Handles Node Failure
The following document details how Kubernetes handles node failure, a detailed troubleshooting section, and our recommendations to achieve high availability for critical applications.
What Happens to Pods When a Worker Node Fails in Kubernetes?
When a worker node in a Kubernetes cluster fails, Kubernetes detects the issue and handles it based on its built-in resilience mechanisms. Here’s what happens step-by-step:
1. Detection of Node Failure
- Kubernetes uses heartbeats (periodic signals) sent from the worker node to the control plane via the Kubelet.
- If the control plane does not receive a heartbeat within a certain time (default
node-monitor-grace-periodis 40 seconds), the node is marked as NotReady.
2. Impact on Pods
- Once the node is marked NotReady, the pods running on it become unreachable.
- The cluster waits for a grace period (default: 5 minutes, controlled by
pod-eviction-timeout) to allow the node to recover. - If the node does not recover:
- Pods managed by ReplicaSets, Deployments, StatefulSets, or DaemonSets are automatically rescheduled on healthy nodes if resources are available.
- Standalone Pods (those not managed by a controller) are not rescheduled and require manual intervention.
3. Storage Considerations
- Stateless Pods: These can be rescheduled easily since they don’t depend on persistent storage.
- Stateful Pods: Rescheduling depends on persistent volumes (PVs). Since we use Rook-Ceph with the
filesystemstorage class, which supports ReadWriteMany (RWX) access mode, persistent volumes can be mounted on multiple nodes. This ensures that PVs remain accessible even when a node fails, facilitating seamless failover. Further, in our case, since we use Ceph with a replication factor of 3 (1 original + 2 copies), PVs will be available on other nodes and this should facilitate seamless failover.
4. Impact on Services
- If a failed pod is part of a Service, Kubernetes automatically removes the pod’s endpoint from the Service. This ensures traffic is not routed to unreachable pods.
Troubleshooting Steps for Node or Pod Failures
If a node fails and its pods become unavailable, follow these troubleshooting steps:
Step 1: Check Node Health
Run the following command to check the status of the node:
kubectl get nodes- Look for the node marked as NotReady.
To get more details about the node:
kubectl describe node <node-name>- Look for issues like taints, resource pressure, or kubelet connectivity.
Step 2: Check Pod Status
Find the pods running on the node:
kubectl get pods --all-namespaces -o wide | grep <node-name>Check the status of affected pods:
kubectl describe pod <pod-name> -n <namespace> kubectl logs <pod-name> -n <namespace>Look for events or errors like CrashLoopBackOff, ImagePullBackOff, or Evicted.
Step 3: Access the Pod Directly
If the pod is part of a Service, try accessing it using its NodePort:
curl http://<node-ip>:<nodeport> -u <username>:<password>Example:
curl http://192.168.49.20:30080 -u admin:password123Verify whether the issue is with pod networking or the application itself.
Step 4: Distinguish Between Node and Application Issues
- If the node is healthy but the pod is failing:
Check the application logs using:
kubectl logs <pod-name> -n <namespace>Review application-specific credentials, endpoints, or configurations.
Step 5: Provide Application-Specific Details
Ensure you have:
- URL: The full endpoint of the application.
- Username and Password: Ensure these are valid and not expired.
Example:
URL: http://192.168.49.20:30080/login Username: admin Password: password123Provide Stribog support team the complete URL, Worker Node Name, NodePort, Application Username, and Password (for further testing). Mention the current behavior and expected behavior.
Step 6: Check Resource Availability
Ensure the cluster has enough resources (CPU, memory) for pods to reschedule:
kubectl top nodes kubectl top podsIf resources are insufficient, consider scaling up your cluster by adding nodes.
Additional Troubleshooting Tips
Check Networking: Use tools like
pingorcurlto verify connectivity between pods, nodes, and services.Verify DNS: Ensure that DNS resolution is working for the cluster.
kubectl exec -it <pod-name> -n <namespace> -- nslookup kubernetes.defaultExamine Events: View Kubernetes events for insights:
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
Steps to Prevent Downtime
1. Use Pod Anti-Affinity
- Why? Pod anti-affinity ensures that multiple replicas of a workload are distributed across different nodes, avoiding a single point of failure. This is particularly important for high-availability applications.
- How? Use the
replicasfield in your deployment to specify the desired number of pod replicas. Combine this withpodAntiAffinityrules to spread the replicas across nodes.
Example configuration for a Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: default
spec:
replicas: 3 # Number of pod replicas
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- my-app
topologyKey: kubernetes.io/hostname
containers:
- name: my-app-container
image: my-app-image:latest
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"Explanation of Key Fields:
replicas: Specifies the desired number of identical pods to run. In this example,3replicas are created.affinity: Includes apodAntiAffinityrule to ensure pods with theapp: my-applabel are scheduled on separate nodes.topologyKey: Thekubernetes.io/hostnameensures that pods are spread across different nodes based on their hostnames.
Why This is Important:
- If all replicas are placed on a single node and that node fails, your application will experience downtime.
- With pod anti-affinity, Kubernetes ensures that pods are distributed, improving the availability of your application even during node failures.
2. Set Resource Requests and Limits
Define resource requirements to prevent overloading nodes:
resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m"
3. Use a DaemonSet for Critical Applications
DaemonSets ensure that critical applications run on every node (e.g., a critical application like EC). Example:
apiVersion: apps/v1 kind: DaemonSet metadata: name: critical-app namespace: kube-system spec: selector: matchLabels: app: critical-app template: metadata: labels: app: critical-app spec: containers: - name: critical-app image: my-critical-app-image
4. Regularly Test Failover Scenarios
- Simulate node failures to validate your cluster’s resilience along with Stribog support team. Always test failover scenarios on Dev cluster and NOT on production cluster.
By following these steps, you can minimize downtime and ensure smooth operation even during node failures. If you encounter issues, the troubleshooting process will help isolate and address the root cause effectively.