Introduction

Foreword

MetalK8s is a Kubernetes distribution with a number of add-ons selected for on-premises deployments, including pre-configured monitoring and alerting, self-healing system configuration, and more.

The installation of a MetalK8s cluster can be broken down into the following steps:

  1. Setup of the environment

  2. Deployment of the Bootstrap node, the first machine in the cluster

  3. Expansion of the cluster, orchestrated from the Bootstrap node

  4. Post installation configuration steps and sanity checks

Choosing a Deployment Architecture

Before starting the installation, choosing an architecture is recommended, as it can impact sizing of the machines and other infrastructure-related details.

Note

“Machines” may indicate bare-metal servers or VMs interchangeably.

Warning

MetalK8s is not designed to handle world-distributed multi-sites architectures. Instead, it focuses on providing a highly resilient cluster at the datacenter scale. To manage multiple sites, look into solutions provided at the application level, or alternatives from the community (such as what the SIG Multicluster provides).

Standard Architecture

The recommended architecture when installing a small-sized MetalK8s cluster emphasizes ease of installation, while providing a high stability for the scheduled workloads:

  • One machine running Bootstrap and control plane services

  • Two other machines running control plane and Infra services

  • Three more machines for workload applications

../_images/standard-arch.png

Machines dedicated to the control plane do not need large amounts of resources (see the sizing notes below), and can safely run as virtual machines. Running workloads on dedicated machines also allows for simpler sizing of said machines, as MetalK8s impact would be negligible.

Extended Architecture

This example architecture focuses on reliability rather than compacity, offering the finest control over the entire platform:

  • One machine dedicated to running Bootstrap services (see the Bootstrap role definition below)

  • Three extra machines (or five if installing a really large cluster, e.g. >100 nodes) for running the Kubernetes control plane (with core K8s services and the backing etcd DB)

  • One or more machines dedicated to running Infra services (see the Infra role)

  • Any number of machines dedicated to running applications, the number and sizing depending on the applications (for instance, Zenko would recommend using three or more machines)

../_images/extended-arch.png

Compact Architectures

While not being focused on having the smallest compute and memory footprints, MetalK8s can provide a fully functional single node “cluster”. The Bootstrap node can be configured to also allow running applications next to all the other services required (see the section about taints below).

A single node cluster does not provide any form of resilience to machine or site failure, which is why the recommended most compact architecture to use in production includes three machines:

  • Two machines running control plane services alongside infra and workload applications

  • One machine running Bootstrap services in addition to all the other services

../_images/compact-arch.png

Note

Sizing of such compact clusters needs to account for the expected load, and the exact impact of colocating an application with MetalK8s services needs to be evaluated by said application’s provider.

Variations

It is possible to customize the chosen architecture using combinations of roles and taints, which are described below, to adapt to the available infrastructure.

As a general recommendation, it is easier to monitor and operate well-isolated groups of machines in the cluster, where hardware issues would only impact one group of services.

It is also possible to evolve an architecture after initial deployment, in case the underlying infrastructure also evolves (new machines can be added through the expansion mechanism, roles can be added or removed…).

Concepts

Although being familiar with Kubernetes concepts is recommended, the necessary concepts to grasp before installing a MetalK8s cluster are presented here.

Nodes

Nodes are Kubernetes worker machines, which allow running containers and can be managed by the cluster (control plane services, described below).

Control Plane and Workload Plane

This dichotomy is central to MetalK8s, and often referred to in other Kubernetes concepts.

The control plane is the set of machines (called nodes) and the services running there that make up the essential Kubernetes functionality for running containerized applications, managing declarative objects, and providing authentication/authorization to end-users as well as services. The main components making up a Kubernetes control plane are:

The workload plane indicates the set of nodes where applications will be deployed via Kubernetes objects, managed by services provided by the control plane.

Note

Nodes may belong to both planes, so that one can run applications alongside the control plane services.

Control plane nodes often are responsible for providing storage for API Server, by running etcd. This responsibility may be offloaded to other nodes from the workload plane (without the etcd taint).

Node Roles

Determining a Node responsibilities is achieved using roles. Roles are stored in Node manifests using labels, of the form node-role.kubernetes.io/<role-name>: ''.

MetalK8s uses five different roles, that may be combined freely:

node-role.kubernetes.io/master

The master role marks a control plane member. control plane services (see above) can only be scheduled on master nodes.

node-role.kubernetes.io/etcd

The etcd role marks a node running etcd for storage of API Server.

node-role.kubernetes.io/infra

The infra role is specific to MetalK8s. It serves for marking nodes where non-critical services provided by the cluster (monitoring stack, UIs, etc.) are running.

node-role.kubernetes.io/bootstrap

This marks the Bootstrap node. This node is unique in the cluster, and is solely responsible for the following services:

  • An RPM package repository used by cluster members

  • An OCI registry for Pods images

  • A Salt Master and its associated SaltAPI

In practice, this role is used in conjunction with the master and etcd roles for bootstrapping the control plane.

In the architecture diagrams presented above, each box represents a role (with the node-role.kubernetes.io/ prefix omitted).

Node Taints

Taints are complementary to roles. When a taint or a set of taints is applied to a Node, only Pods with the corresponding tolerations can be scheduled on that Node.

Taints allow dedicating Nodes to specific use-cases, such as having Nodes dedicated to running control plane services.

Refer to the architecture diagrams above for examples: each T marker on a role means the taint corresponding to this role has been applied on the Node.

Note that Pods from the control plane services (corresponding to master and etcd roles) have tolerations for the bootstrap and infra taints. This is because after bootstrapping the first Node, it will be configured as follows:

../_images/bootstrap-single-node-arch.png

The taints applied are only tolerated by services deployed by MetalK8s. If the selected architecture requires workloads to run on the Bootstrap node, these taints should be removed.

../_images/bootstrap-remove-taints.png

To achieve this, use the following commands after deployment:

root@bootstrap $ kubectl taint nodes <bootstrap-node-name> \
                   node-role.kubernetes.io/bootstrap:NoSchedule-
root@bootstrap $ kubectl taint nodes <bootstrap-node-name> \
                   node-role.kubernetes.io/infra:NoSchedule-

Note

To get more in-depth information about taints and tolerations, see the official Kubernetes documentation.

Networks

A MetalK8s cluster requires a physical network for both the control plane and the workload plane Nodes. Although these may be the same network, the distinction will still be made in further references to these networks, and when referring to a Node IP address. Each Node in the cluster must belong to these two networks.

The control plane network will serve for cluster services to communicate with each other. The workload plane network will serve for exposing applications, including the ones in infra Nodes, to the outside world.

Todo

Reference Ingress

MetalK8s also allows one to configure virtual networks used for internal communications:

  • A network for Pods, defaulting to 10.233.0.0/16

  • A network for Services, defaulting to 10.96.0.0/12

In case of conflicts with the existing infrastructure, make sure to choose other ranges during the Bootstrap configuration.

Additional Notes

Sizing

Defining an appropriate sizing for the machines in a MetalK8s cluster strongly depends on the selected architecture and the expected future variations to this architecture. Refer to the documentation of the applications planned to run in the deployed cluster before completing the sizing, as their needs will compete with the cluster’s.

Each role, describing a group of services, requires a certain amount of resources for it to run properly. If multiple roles are used on a single Node, these requirements add up.

Role

Services

CPU

RAM

Required storage

Recommended storage

bootstrap

Package repositories, container registries, Salt master

1 core

2 GB

Sufficient space for the product ISO archives

etcd

etcd database for K8s API

0.5 core

1 GB

1 GB for /var/lib/etcd

master

K8s API, scheduler, and controllers

0.5 core

1 GB

infra

Monitoring services, Ingress controllers

0.5 core

2 GB

10 GB partition for Prometheus 1 GB partition for Alertmanager

requirements common to any Node

Salt minion, Kubelet

0.2 core

0.5 GB

40 GB root partition

100 GB or more for /var

These numbers are not accounting for highly unstable workloads or other sources of unpredictable load on the cluster services, and it is recommended to provide an additional 50% of resources as a safety margin.

Consider the official recommendations for etcd sizing as the stability of a MetalK8s installation depends strongly on the backing etcd stability (see this note for more details). Prometheus and Alertmanager also require storage, as explained in this section.

Deploying with Cloud Providers

When installing in a virtual environment, such as AWS EC2 or OpenStack, special care will be needed for adjusting networks configuration. Virtual environments often add a layer of security at the port level, which should be disabled, or circumvented with IP-in-IP encapsulation.

Also note that Kubernetes has numerous integrations with existing cloud providers to provide easier access to proprietary features, such as load balancers. For more information, see this documentation article.