Troubleshooting Guide¶
This section highlights some of the common problems users face during and after a MetalK8s installation. If you do not find a solution to a problem you are facing, please reach out to Scality support or create a Github issue.
Bootstrap Installation Errors¶
Bootstrap Installation fails with no straightforward reason¶
If during a MetalK8s installation you encounter a failure and the console
output does not provide sufficient information in order to pin-point the cause
of failure, then re-run the installation with the verbose flag (--verbose
).
root@bootstrap $ /srv/scality/metalk8s-2.5.3-dev/bootstrap.sh --verbose
Errors after restarting the Bootstrap node¶
If you reboot the Bootstrap node and for some reason, some containers (especially the salt-master container) refuses to start then perform the following checks:
Check and ensure that the MetalK8s ISO is mounted properly.
[root@bootstrap vagrant]# mount | grep /srv/scality/metalk8s-2.5.3-dev /home/centos/metalk8s.iso on /srv/scality/metalk8s-2.5.3-dev type iso9660 (ro,relatime)
If the ISO is unmounted, run the following command which will check the the status of the ISO file and remount it automatically.
[root@bootstrap vagrant]# salt-call state.sls metalk8s.archives.mounted saltenv=metalk8s-2.5.3-dev Summary for local ------------ Succeeded: 3 Failed: 0
Bootstrap fails and console log is unscrollable¶
If during a MetalK8s installation, the Bootstrap process fails and the console
output is unscrollable then you can consult the Bootstrap logs in
/var/log/metalk8s-bootstrap.log
.
Account Administration Errors¶
Forgot the MetalK8s GUI password¶
If you forget the MetalK8s GUI username and/or password combination, follow this procedure to reset or change it.
General Kubernetes Resource Errors¶
Pod status shows “CrashLoopBackOff”¶
If after a MetalK8s installation, you notice some Pods are in a state of “CrashLoopBackOff”, then it means pods are crashing because they start up then immediately exit, thus Kubernetes restarts them and the cycle continues. To get possible clues about this error, run the following commands and inspect the output.
[root@bootstrap vagrant]# kubectl -n kube-system describe pods <pod name>
Name: <pod name>
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Persistent Volume Claim(PVC) stuck in “Pending” state¶
If after provisioning a Volume for a Pod (e.g. Prometheus) and the PVC still hangs in a Pending state, then try checking the following:
Check that the volumes have been provisioned and are in a Ready state:
kubectl describe volume <volume-name> [root@bootstrap vagrant]# kubectl describe volume test-volume Name: <volume-name> Status: Conditions: Last Transition Time: 2020-01-14T12:57:56Z Last Update Time: 2020-01-14T12:57:56Z Status: True Type: Ready
Check that a corresponding PersistentVolume exist:
[root@bootstrap vagrant]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS STORAGECLASS AGE CLAIM <volume-name> 10Gi RWO Retain Bound <storage-class-name> 4d22h <persistentvolume-claim-name>
Check that the PersistentVolume matches the PersistentVolume Claim constraints (size, labels, storage class) by doing the following:
Find the name of your PersistentVolume Claim:
[root@bootstrap vagrant]# kubectl get pvc -n <namespace> NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE <persistent-volume-claim-name> Bound <volume-name> 10Gi RWO <storage-class-name> 24h
Then check the PersistentVolume Claim constraints if they match:
[root@bootstrap vagrant]# kubectl describe pvc <persistevolume-claim-name> -n <namespace> Name: <persistentvolume-claim-name> Namespace: <namespace> StorageClass: <storage-class-name> Status: Bound Volume: <volume-name> Capacity: 10Gi Access Modes: RWO VolumeMode: Filesystem
If no PersistentVolume exist, then check that the storage operator is up and running.
[root@bootstrap vagrant]# kubectl -n kube-system get deployments storage-operator NAME READY UP-TO-DATE AVAILABLE AGE storage-operator 1/1 1 1 4d22h
Access to MetalK8s GUI fails with “undefined backend”¶
If in the cause of using the MetalK8s GUI, you encounter an “undefined backend” error then perform the following checks:
Check that the Ingress pods are running:
[root@bootstrap vagrant]# kubectl -n metalk8s-ingress get daemonsets NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE nginx-ingress-control-plane-controller 1 1 1 1 1 node-role.kubernetes.io/master= 4d22h nginx-ingress-controller 1 1 1 1 1 <none> 4d22h
Check the Ingress controller logs:
[root@bootstrap vagrant]# kubectl logs -n metalk8s-ingress nginx-ingress-control-plane-controller-ftg6v ------------------------------------------------------------------------------- NGINX Ingress controller Release: 0.26.1 Build: git-2de5a893a Repository: https://github.com/kubernetes/ingress-nginx nginx version: openresty/1.15.8.2
Pod and Service CIDR conflicts¶
If after installation of a MetalK8s cluster you notice that Pod-to-Pod communication has routing problems, perform the following:
Check the configured values for the internal Pod and Service networks:
[root@bootstrap vagrant]# salt-call pillar.get networks local: ---------- control_plane: 172.21.254.0/28 pod: 10.233.0.0/16 service: 10.96.0.0/12 workload_plane: 172.21.254.32/27
Make sure the configured IP ranges (CIDR notation) do not conflict with your infrastructure.
Todo
Add Salt master/minion logs, and explain how to run a specific state from the Salt master.
Add troubleshooting for networking issues.