Blue Matador checks the healthz endpoint of the Kubernetes API periodically to ensure that the API reports itself as healthy. If the check fails consistently, an event is created. While it is rare that this kubernetes health check fails, it can severely impact your ability to manage the cluster. If you are able to still make API calls and use kubectl, check the component statuses and the health of each node and look for any events that may have caused the API to become unhealthy. Another possible cause is if a network issue is preventing one of Blue Matador's agents from reaching the API. Check for any firewalls between the two and ensure the issue is not wide spread.
A Kubernetes component status describes the high-level health of one of several kubernetes essential cluster services. An unhealthy component can cause issues including incorrectly scheduled pods and not recognizing all nodes in the cluster. Unfortunately, component statuses are not very well-documented, and issues with them can be difficult to diagnose, and may even be benign.
A few common issues with component statuses include:
Using a bootstrapping tool such as kops or running a managed cluster in your cloud provider can make bootstrapping and managing a Kubernetes cluster much easier. Having a defined process for upgrading cluster resources, or even spinning up an entirely new cluster, can reduce the impact that component issues present.
To debug problems related to cluster upgrades, check out the Github issues to see if the issues you are seeing have a workaround or have been fixed in a different version.