Deployment of the Bootstrap node

Preparation

  1. Retrieve a MetalK8s ISO (you may build one yourself by following our developer guide). Scality customers can retrieve validated builds as part of their license from the Scality repositories.

  2. Download the MetalK8s ISO file on the machine that will host the bootstrap node. Run checkisomd5 –verbose <path-to-iso> to validate its integrity (checkisomd5 is part of the isomd5sum package).

  3. Mount this ISO file at the path of your choice (we will use /srv/scality/metalk8s-|version| for the rest of this guide, as this is where the ISO will be mounted automatically after running the bootstrap script):

    root@bootstrap $ mkdir -p /srv/scality/metalk8s-127.0.3
    root@bootstrap $ mount <path-to-iso> /srv/scality/metalk8s-127.0.3

Configuration

  1. Create the MetalK8s configuration directory.

    root@bootstrap $ mkdir /etc/metalk8s
    
  2. Create the /etc/metalk8s/bootstrap.yaml file. This file contains initial configuration settings which are mandatory for setting up a MetalK8s Bootstrap node. Change the networks, IP address, and hostname fields to conform to your infrastructure.

    apiVersion: metalk8s.scality.com/v1alpha3
    kind: BootstrapConfiguration
    networks:
      controlPlane:
        cidr: <CIDR-notation>
      workloadPlane:
        cidr: <CIDR-notation>
        mtu: <network-MTU>
      pods: <CIDR-notation>
      services: <CIDR-notation>
      portmap:
        cidr:
          - <CIDR-notation>
      nodeport:
        cidr:
          - <CIDR-notation>
    proxies:
      http: <http://proxy-ip:proxy-port>
      https: <https://proxy-ip:proxy-port>
      no_proxy:
        - <host>
        - <ip/cidr>
    ca:
      minion: <hostname-of-the-bootstrap-node>
    archives:
      - <path-to-metalk8s-iso>
    addons:
      dex:
        enabled: True
      fluent-bit:
        enabled: True
      loki:
        enabled: True
    kubernetes:
      apiServer:
        oidc: {}
        featureGates:
          <feature_gate_name>: True
      controllerManager:
        config:
          terminatedPodGCThreshold: 500
      coreDNS:
        hostForward: True
        replicas: 2
        affinity:
          podAntiAffinity:
            hard: []
            soft:
              - topologyKey: kubernetes.io/hostname
      kubelet:
        config:
          maxPods: 110
    salt:
      master:
        worker_threads: 12
        timeout: 20
    

The networks field specifies a range of IP addresses written in CIDR notation for it’s various subfields.

The controlPlane and workloadPlane entries are mandatory. These values specify the range of IP addresses that will be used at the host level for each member of the cluster.

Note

Several CIDRs can be provided if all nodes do not sit in the same network. This is an advanced configuration which we do not recommend for non-experts.

For workloadPlane entry an MTU can also be provided, this MTU value should be the lowest MTU value accross all the workload plane network. The default value for this MTU is 1460.

networks:
  controlPlane:
    cidr: 10.200.1.0/28
  workloadPlane:
    cidr: 10.200.1.0/28
    mtu: 1500

All nodes within the cluster must connect to both the control plane and workload plane networks. If the same network range is chosen for both the control plane and workload plane networks then the same interface may be used.

The pods and services fields are not mandatory, though can be changed to match the constraints of existing networking infrastructure (for example, if all or part of these default subnets is already routed). During installation, by default pods and services are set to the following values below if omitted.

For production clusters, we advise users to anticipate future expansions and use sufficiently large networks for pods and services.

networks:
  pods: 10.233.0.0/16
  services: 10.96.0.0/12

The portmap field is not mandatory, though can be changed in order to expose the hostPort on different IPs, with cidr you can define a list of range of IP addresses that will be used at the host level for each member of the cluster to expose the portmap (default to node Workload Plane IP)

Note

The Workload Plane Ingress rely on those hostPort, which means that if you change this portmap the Workload Plane Ingress will be exposed on IPs matching the portmap range of IP on every member of the cluster

The nodeport field is not mandatory, though can be changed in order to expose the nodePort services on different IPs, with cidr you can define a list of range of IP addresses that will be used at the host level for each member of the cluster to expose the nodePort services (default to node Workload Plane IP)

The proxies field can be omitted if there is no proxy to configure. The 2 entries http and https are used to configure the containerd daemon proxy to fetch extra container images from outstide the MetalK8s cluster. The no_proxy entry specifies IPs that should be excluded from proxying, it must be a list of hosts, IP addresses or IP ranges in CIDR format. For example;

no_proxy:
  - localhost
  - 127.0.0.1
  - 10.10.0.0/16
  - 192.168.0.0/16

The archives field is a list of absolute paths to MetalK8s ISO files. When the bootstrap script is executed, those ISOs are automatically mounted and the system is configured to re-mount them automatically after a reboot.

The addons field can be omitted if you do not have any specific addons to configure.

If you need to disable deployment of dex as default OIDC used by MetalK8s you can disable it by setting addons.dex.enabled to false. If dex is disabled you will not be able to use the MetalK8s UI and Grafana.

Deployment of logging stack relying on fluent-bit and loki can be disabled respectively by setting addons.fluent-bit.enabled and addons.loki.enabled to false.

The kubernetes field can be omitted if you do not have any specific Kubernetes Feature Gates to enable or disable and if you are ok with defaults kubernetes configuration.

If you need to enable or disable specific features for kube-apiserver configure the corresponding entries in the kubernetes.apiServer.featureGates mapping.

If dex is enabled, it will be used as oidc for kube-apiserver but you can use a specific OpenID for kube-apiserver, to do so:

kubernetes:
  apiServer:
    oidc:
      issuerURL: <OIDC issuer URL>
      clientID: <Client ID>
      CAFile: <Certificate Authority certificate file>
      usernameClaim: <Username Claim>
      groupsClaim: <Groups Claim>

From coreDNS section you can disable the hostForward so that the DNS request from the Pod inside Kubernetes are not forwarded to the configured nameservers from the host.

Note

It means pod running in Kubernetes will not be able to resolve any name that are not in Kubernetes.

If you want to override the default coreDNS podAntiAffinity or number of replicas, by default MetalK8s deploy 2 replicas and use soft podAntiAffinity on hostname so that if it’s possible coreDNS pods will be spread on different infra nodes. If you have more infra node than coreDNS replicas, you should set hard podAntiAffinity on hostname so that you are sure that coreDNS pods sit on different node, to do so:

kubernetes:
  coreDNS:
    affinity:
      podAntiAffinity:
        hard:
          - topologyKey: kubernetes.io/hostname

From controllerManager section you can override the number of terminated pods that can exist before the terminated pod garbage collector starts deleting them. If it’s set to 0, the terminated pod garbage collector is disabled (default to 500)

From kubelet section you can override the max number of pods that can be scheduled on each nodes.

The salt field can be omitted if you do not have any specific salt settings to configure.

From master section you can override the number of worker threads used by salt master and the timeout for salt master to get an answer from minions

SSH Provisioning

  1. Prepare the MetalK8s PKI directory.

    root@bootstrap $ mkdir -p /etc/metalk8s/pki
    
  2. Generate a passwordless SSH key that will be used for authentication to future new nodes.

    root@bootstrap $ ssh-keygen -t rsa -b 4096 -N '' -f /etc/metalk8s/pki/salt-bootstrap
    

    Warning

    Although the key name is not critical (will be re-used afterwards, so make sure to replace occurences of salt-bootstrap where relevant), this key must exist in the /etc/metalk8s/pki directory.

  3. Accept the new identity on future new nodes (run from your host).

    1. Retrieve the public key from the Bootstrap node.

      user@host $ scp root@bootstrap:/etc/metalk8s/pki/salt-bootstrap.pub /tmp/salt-bootstrap.pub
      
    2. Authorize this public key on each new node (this command assumes a functional SSH access from your host to the target node). Repeat until all nodes accept SSH connections from the Bootstrap node.

      user@host $ ssh-copy-id -i /tmp/salt-bootstrap.pub root@<node_hostname>
      

Installation

Run the Installation

Run the bootstrap script to install binaries and services required on the Bootstrap node.

root@bootstrap $ /srv/scality/metalk8s-127.0.3/bootstrap.sh

Warning

For virtual networks (or any network which enforces source and destination fields of IP packets to correspond to the MAC address(es)), IP-in-IP needs to be enabled.

Validate the install

  • Check that all Pods on the Bootstrap node are in the Running state. Note that Prometheus and Alertmanager pods will remain in a Pending state until their respective persistent storage volumes are provisioned.

Note

The administrator Kubeconfig file is used to configure access to Kubernetes when used with kubectl as shown below. This file contains sensitive information and should be kept securely.

On all subsequent kubectl commands, you may omit the --kubeconfig argument if you have exported the KUBECONFIG environment variable set to the path of the administrator Kubeconfig file for the cluster.

By default, this path is /etc/kubernetes/admin.conf.

root@bootstrap $ export KUBECONFIG=/etc/kubernetes/admin.conf
root@bootstrap $ kubectl get nodes --kubeconfig /etc/kubernetes/admin.conf
NAME                   STATUS    ROLES                         AGE       VERSION
bootstrap              Ready     bootstrap,etcd,infra,master   17m       v1.15.5

root@bootstrap $ kubectl get pods --all-namespaces -o wide --kubeconfig /etc/kubernetes/admin.conf
NAMESPACE             NAME                                                      READY   STATUS    RESTARTS   AGE     IP               NODE            NOMINATED NODE   READINESS GATES
kube-system           calico-kube-controllers-7c9944c5f4-h9bsc                  1/1     Running   0          6m29s   10.233.220.129   bootstrap   <none>           <none>
kube-system           calico-node-v4qhb                                         1/1     Running   0          6m29s   10.200.3.152     bootstrap   <none>           <none>
kube-system           coredns-ff46db798-k54z9                                   1/1     Running   0          6m29s   10.233.220.134   bootstrap   <none>           <none>
kube-system           coredns-ff46db798-nvmjl                                   1/1     Running   0          6m29s   10.233.220.132   bootstrap   <none>           <none>
kube-system           etcd-bootstrap                                            1/1     Running   0          5m45s   10.200.3.152     bootstrap   <none>           <none>
kube-system           kube-apiserver-bootstrap                                  1/1     Running   0          5m57s   10.200.3.152     bootstrap   <none>           <none>
kube-system           kube-controller-manager-bootstrap                         1/1     Running   0          7m4s    10.200.3.152     bootstrap   <none>           <none>
kube-system           kube-proxy-n6zgk                                          1/1     Running   0          6m32s   10.200.3.152     bootstrap   <none>           <none>
kube-system           kube-scheduler-bootstrap                                  1/1     Running   0          7m4s    10.200.3.152     bootstrap   <none>           <none>
kube-system           repositories-bootstrap                                    1/1     Running   0          6m20s   10.200.3.152     bootstrap   <none>           <none>
kube-system           salt-master-bootstrap                                     2/2     Running   0          6m10s   10.200.3.152     bootstrap   <none>           <none>
kube-system           storage-operator-7567748b6d-hp7gq                         1/1     Running   0          6m6s    10.233.220.138   bootstrap   <none>           <none>
metalk8s-ingress      nginx-ingress-control-plane-controller-5nkkx              1/1     Running   0          6m6s    10.233.220.137   bootstrap   <none>           <none>
metalk8s-ingress      nginx-ingress-controller-shg7x                            1/1     Running   0          6m7s    10.233.220.135   bootstrap   <none>           <none>
metalk8s-ingress      nginx-ingress-default-backend-7d8898655c-jj7l6            1/1     Running   0          6m7s    10.233.220.136   bootstrap   <none>           <none>
metalk8s-logging      loki-0                                                    0/1     Pending   0          6m21s    <none>           <none>      <none>           <none>
metalk8s-monitoring   alertmanager-prometheus-operator-alertmanager-0           0/2     Pending   0          6m1s    <none>           <none>      <none>           <none>
metalk8s-monitoring   prometheus-operator-grafana-775fbb5b-sgngh                2/2     Running   0          6m17s   10.233.220.130   bootstrap   <none>           <none>
metalk8s-monitoring   prometheus-operator-kube-state-metrics-7587b4897c-tt79q   1/1     Running   0          6m17s   10.233.220.131   bootstrap   <none>           <none>
metalk8s-monitoring   prometheus-operator-operator-7446d89644-zqdlj             1/1     Running   0          6m17s   10.233.220.133   bootstrap   <none>           <none>
metalk8s-monitoring   prometheus-operator-prometheus-node-exporter-rb969        1/1     Running   0          6m17s   10.200.3.152     bootstrap   <none>           <none>
metalk8s-monitoring   prometheus-prometheus-operator-prometheus-0               0/3     Pending   0          5m50s   <none>           <none>      <none>           <none>
metalk8s-ui           metalk8s-ui-6f74ff4bc-fgk86                               1/1     Running   0          6m4s    10.233.220.139   bootstrap   <none>           <none>
  • From the console output above, Prometheus, Alertmanager and Loki pods are in a Pending state because their respective persistent storage volumes need to be provisioned. To provision these persistent storage volumes, follow this procedure.

  • Check that you can access the MetalK8s GUI after the installation is completed by following this procedure.

  • At this stage, the MetalK8s GUI should be up and ready for you to explore.

    Note

    Monitoring through the MetalK8s GUI will not be available until persistent storage volumes for both Prometheus and Alertmanager have been successfully provisioned.

  • If you encounter an error during installation or have issues validating a fresh MetalK8s installation, refer to the Troubleshooting section.