Introduction¶

Foreword¶

MetalK8s is a Kubernetes distribution with a number of add-ons selected for on-premises deployments, including pre-configured monitoring and alerting, self-healing system configuration, and more.

The installation of a MetalK8s cluster can be broken down into the following steps:

Setup of the environment
Deployment of the Bootstrap node, the first machine in the cluster
Expansion of the cluster, orchestrated from the Bootstrap node
Post installation configuration steps and sanity checks

Choosing a Deployment Architecture¶

Before starting the installation, choosing an architecture is recommended, as it can impact sizing of the machines and other infrastructure-related details.

Note

“Machines” may indicate bare-metal servers or VMs interchangeably.

Warning

MetalK8s is not designed to handle world-distributed multi-sites architectures. Instead, it focuses on providing a highly resilient cluster at the datacenter scale. To manage multiple sites, look into solutions provided at the application level, or alternatives from the community (such as what the SIG Multicluster provides).

Standard Architecture¶

The recommended architecture when installing a small-sized MetalK8s cluster emphasizes ease of installation, while providing a high stability for the scheduled workloads:

One machine running Bootstrap and control plane services
Two other machines running control plane and Infra services
Three more machines for workload applications

Machines dedicated to the control plane do not need large amounts of resources (see the sizing notes below), and can safely run as virtual machines. Running workloads on dedicated machines also allows for simpler sizing of said machines, as MetalK8s impact would be negligible.

Extended Architecture¶

This example architecture focuses on reliability rather than compacity, offering the finest control over the entire platform:

One machine dedicated to running Bootstrap services (see the Bootstrap role definition below)
Three extra machines (or five if installing a really large cluster, e.g. >100 nodes) for running the Kubernetes control plane (with core K8s services and the backing etcd DB)
One or more machines dedicated to running Infra services (see the Infra role)
Any number of machines dedicated to running applications, the number and sizing depending on the applications (for instance, Zenko would recommend using three or more machines)

Compact Architectures¶

While not being focused on having the smallest compute and memory footprints, MetalK8s can provide a fully functional single node “cluster”. The Bootstrap node can be configured to also allow running applications next to all the other services required (see the section about taints below).

A single node cluster does not provide any form of resilience to machine or site failure, which is why the recommended most compact architecture to use in production includes three machines:

Two machines running control plane services alongside infra and workload applications
One machine running Bootstrap services in addition to all the other services

Note

Sizing of such compact clusters needs to account for the expected load, and the exact impact of colocating an application with MetalK8s services needs to be evaluated by said application’s provider.

Variations¶

It is possible to customize the chosen architecture using combinations of roles and taints, which are described below, to adapt to the available infrastructure.

As a general recommendation, it is easier to monitor and operate well-isolated groups of machines in the cluster, where hardware issues would only impact one group of services.

It is also possible to evolve an architecture after initial deployment, in case the underlying infrastructure also evolves (new machines can be added through the expansion mechanism, roles can be added or removed…).

Concepts¶

Although being familiar with Kubernetes concepts is recommended, the necessary concepts to grasp before installing a MetalK8s cluster are presented here.

Nodes¶

Nodes are Kubernetes worker machines, which allow running containers and can be managed by the cluster (control plane services, described below).

Control Plane and Workload Plane¶

This dichotomy is central to MetalK8s, and often referred to in other Kubernetes concepts.

The control plane is the set of machines (called nodes) and the services running there that make up the essential Kubernetes functionality for running containerized applications, managing declarative objects, and providing authentication/authorization to end-users as well as services. The main components making up a Kubernetes control plane are:

The workload plane indicates the set of nodes where applications will be deployed via Kubernetes objects, managed by services provided by the control plane.

Note

Nodes may belong to both planes, so that one can run applications alongside the control plane services.

Control plane nodes often are responsible for providing storage for API Server, by running etcd. This responsibility may be offloaded to other nodes from the workload plane (without the etcd taint).

Node Roles¶

Determining a Node responsibilities is achieved using roles. Roles are stored in Node manifests using labels, of the form node-role.kubernetes.io/<role-name>: ''.

MetalK8s uses five different roles, that may be combined freely:

node-role.kubernetes.io/master: The master role marks a control plane member. control plane services (see above) can only be scheduled on master nodes.

node-role.kubernetes.io/etcd: The etcd role marks a node running etcd for storage of API Server.

node-role.kubernetes.io/infra: The infra role is specific to MetalK8s. It serves for marking nodes where non-critical services provided by the cluster (monitoring stack, UIs, etc.) are running.

node-role.kubernetes.io/bootstrap

This marks the Bootstrap node. This node is unique in the cluster, and is solely responsible for the following services:

An RPM package repository used by cluster members
An OCI registry for Pods images
A Salt Master and its associated SaltAPI

In practice, this role is used in conjunction with the master and etcd roles for bootstrapping the control plane.

In the architecture diagrams presented above, each box represents a role (with the node-role.kubernetes.io/ prefix omitted).

Node Taints¶

Taints are complementary to roles. When a taint or a set of taints is applied to a Node, only Pods with the corresponding tolerations can be scheduled on that Node.

Taints allow dedicating Nodes to specific use-cases, such as having Nodes dedicated to running control plane services.

Refer to the architecture diagrams above for examples: each T marker on a role means the taint corresponding to this role has been applied on the Node.

Note that Pods from the control plane services (corresponding to master and etcd roles) have tolerations for the bootstrap and infra taints. This is because after bootstrapping the first Node, it will be configured as follows:

../_images/bootstrap-single-node-arch.png

The taints applied are only tolerated by services deployed by MetalK8s. If the selected architecture requires workloads to run on the Bootstrap node, these taints should be removed.

To achieve this, use the following commands after deployment:

root@bootstrap $ kubectl taint nodes <bootstrap-node-name> \
                   node-role.kubernetes.io/bootstrap:NoSchedule-
root@bootstrap $ kubectl taint nodes <bootstrap-node-name> \
                   node-role.kubernetes.io/infra:NoSchedule-

Note

To get more in-depth information about taints and tolerations, see the official Kubernetes documentation.

Networks¶

A MetalK8s cluster requires a physical network for both the control plane and the workload plane Nodes. Although these may be the same network, the distinction will still be made in further references to these networks, and when referring to a Node IP address. Each Node in the cluster must belong to these two networks.

The control plane network will serve for cluster services to communicate with each other. The workload plane network will serve for exposing applications, including the ones in infra Nodes, to the outside world.

Todo

Reference Ingress

MetalK8s also allows one to configure virtual networks used for internal communications:

A network for Pods, defaulting to 10.233.0.0/16
A network for Services, defaulting to 10.96.0.0/12

In case of conflicts with the existing infrastructure, make sure to choose other ranges during the Bootstrap configuration.

Additional Notes¶

Sizing¶

Defining an appropriate sizing for the machines in a MetalK8s cluster strongly depends on the selected architecture and the expected future variations to this architecture. Refer to the documentation of the applications planned to run in the deployed cluster before completing the sizing, as their needs will compete with the cluster’s.

Each role, describing a group of services, requires a certain amount of resources for it to run properly. If multiple roles are used on a single Node, these requirements add up.

Role	Services	CPU	RAM	Required storage	Recommended storage
bootstrap	Package repositories, container registries, Salt master	1 core	2 GB	Sufficient space for the product ISO archives
etcd	`etcd` database for K8s API	0.5 core	1 GB	1 GB for `/var/lib/etcd`
master	K8s API, scheduler, and controllers	0.5 core	1 GB
infra	Monitoring services, Ingress controllers	0.5 core	2 GB	10 GB partition for Prometheus 1 GB partition for Alertmanager
requirements common to any Node	Salt minion, Kubelet	0.2 core	0.5 GB	40 GB root partition	100 GB or more for `/var`

These numbers are not accounting for highly unstable workloads or other sources of unpredictable load on the cluster services, and it is recommended to provide an additional 50% of resources as a safety margin.

Consider the official recommendations for etcd sizing as the stability of a MetalK8s installation depends strongly on the backing etcd stability (see this note for more details). Prometheus and Alertmanager also require storage, as explained in this section.

Deploying with Cloud Providers¶

When installing in a virtual environment, such as AWS EC2 or OpenStack, special care will be needed for adjusting networks configuration. Virtual environments often add a layer of security at the port level, which should be disabled, or circumvented with IP-in-IP encapsulation.

Also note that Kubernetes has numerous integrations with existing cloud providers to provide easier access to proprietary features, such as load balancers. For more information, see this documentation article.