Post-Installation Procedure¶

Provision Storage for Prometheus Services¶

After bootstrapping the cluster, the Prometheus and AlertManager services used to monitor the system will not be running (the respective Pods will remain in Pending state), because they require persistent storage to be available.

You can either provision these storage volumes on the Bootstrap node, or later on other nodes joining the cluster. It is even recommended to separate Bootstrap services from Infra services.

To create the required Volume objects, write a YAML file with the following contents, replacing <node_name> with the name of the Node on which to run Prometheus and AlertManager, and <device_path[2]> with the /dev path for the partitions to use:

---
apiVersion: storage.metalk8s.scality.com/v1alpha1
kind: Volume
metadata:
  name: <node_name>-prometheus
spec:
  nodeName: <node_name>
  storageClassName: metalk8s-prometheus
  rawBlockDevice:  # Choose a device with at least 10GiB capacity
    devicePath: <device_path>
  template:
    metadata:
      labels:
        app.kubernetes.io/name: 'prometheus-operator-prometheus'
---
apiVersion: storage.metalk8s.scality.com/v1alpha1
kind: Volume
metadata:
  name: <node_name>-alertmanager
spec:
  nodeName: <node_name>
  storageClassName: metalk8s-prometheus
  rawBlockDevice:  # Choose a device with at least 1GiB capacity
    devicePath: <device_path2>
  template:
    metadata:
      labels:
        app.kubernetes.io/name: 'prometheus-operator-alertmanager'
---

Once this file is created with the right values filled in, run the following command to create the Volume objects (replacing <file_path> with the path of the aforementioned YAML file):

root@bootstrap $ kubectl --kubeconfig /etc/kubernetes/admin.conf \
                   apply -f <file_path>

For more details on the available options for storage management, see this section of the Operational Guide.

Todo

Sanity check
Troubleshooting if needed

Changing credentials¶

After a fresh installation, an administrator account is created with default credentials. For production deployments, make sure to change those credentials and use safer values.

To change user credentials and groups for K8s API (and as such, for MetalK8s GUI and SaltAPI), follow this procedure.

To change Grafana user credentials, follow this procedure.

Validating the deployment¶

To ensure the Kubernetes cluster is properly running before scheduling applications, perform the following sanity checks:

Check that all desired Nodes are in a Ready state and show the expected roles:

root@bootstrap $ kubectl --kubeconfig /etc/kubernetes/admin.conf \
                   get nodes
NAME         STATUS   ROLES                         AGE   VERSION
bootstrap    Ready    bootstrap,etcd,infra,master   42m   v1.15.5
node-1       Ready    etcd,infra,master             26m   v1.15.5
node-2       Ready    etcd,infra,master             25m   v1.15.5

Use the kubectl describe node <node_name> to get more details about a Node (for instance, to check the right taints are applied).

Check that Pods are in their expected state (most of the time, Running, except for Prometheus and AlertManager if the required storage was not provisioned yet - see the procedure above).

To look for all Pods at once, use the --all-namespaces flag. On the other hand, use the -n or --namespace option to select Pods in a given Namespace.

For instance, to check all Pods making up the cluster-critical services:

root@bootstrap $ kubectl --kubeconfig /etc/kubernetes/admin.conf \
                   get pods --namespace kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
apiserver-proxy-bootstrap                  1/1     Running   0          43m
apiserver-proxy-node-1                     1/1     Running   0          2m28s
apiserver-proxy-node-2                     1/1     Running   0          9m
calico-kube-controllers-6d8db9bcf5-w5w94   1/1     Running   0          43m
calico-node-4vxpp                          1/1     Running   0          43m
calico-node-hvlkx                          1/1     Running   7          23m
calico-node-jhj4r                          1/1     Running   0          8m59s
coredns-8576b4bf99-lfjfc                   1/1     Running   0          43m
coredns-8576b4bf99-tnt6b                   1/1     Running   0          43m
etcd-bootstrap                             1/1     Running   0          43m
etcd-node-1                                1/1     Running   0          3m47s
etcd-node-2                                1/1     Running   3          8m58s
kube-apiserver-bootstrap                   1/1     Running   0          43m
kube-apiserver-node-1                      1/1     Running   0          2m45s
kube-apiserver-node-2                      1/1     Running   0          7m31s
kube-controller-manager-bootstrap          1/1     Running   3          44m
kube-controller-manager-node-1             1/1     Running   1          2m39s
kube-controller-manager-node-2             1/1     Running   2          7m25s
kube-proxy-gnxtp                           1/1     Running   0          28m
kube-proxy-kvtjm                           1/1     Running   0          43m
kube-proxy-vggzg                           1/1     Running   0          27m
kube-scheduler-bootstrap                   1/1     Running   1          44m
kube-scheduler-node-1                      1/1     Running   0          2m39s
kube-scheduler-node-2                      1/1     Running   0          7m25s
repositories-bootstrap                     1/1     Running   0          44m
salt-master-bootstrap                      2/2     Running   0          44m
storage-operator-756b87c78f-mjqc5          1/1     Running   1          43m

Using the result of the above command, obtain a shell in a running etcd Pod (replacing <etcd_pod_name> with the appropriate value):

root@bootstrap $ kubectl --kubeconfig /etc/kubernetes/admin.conf \
                   exec --namespace kube-system -it <etcd_pod_name> sh

Once in this shell, use the following to obtain health information for the etcd cluster:

root@etcd-bootstrap $ etcdctl --endpoints=https://[127.0.0.1]:2379 \
                        --ca-file=/etc/kubernetes/pki/etcd/ca.crt \
                        --cert-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
                        --key-file=/etc/kubernetes/pki/etcd/healthcheck-client.key \
                        cluster-health

member 46af28ca4af6c465 is healthy: got healthy result from https://<first-node-ip>:2379
member 81de403db853107e is healthy: got healthy result from https://<second-node-ip>:2379
member 8878627efe0f46be is healthy: got healthy result from https://<third-node-ip>:2379
cluster is healthy

Finally, check that the exposed services are accessible, using the information from this document.