Operation¶
This guide describes MetalK8s ISO preparation steps, upgrade and downgrade guidelines, supported versions and best practices required for operating MetalK8s. Refer to the Installation if you do not have a working MetalK8s setup.
- Cluster Monitoring
- Monitoring Stack
- Alerts
- Predefined Alerting Rules
- Hierarchy
- Composite Rules
- AccessServicesDegraded
- AlertingServiceAtRisk
- AlertingServiceDegraded
- AuthenticationServiceDegraded
- BootstrapServicesDegraded
- ClusterAtRisk
- ClusterDegraded
- CoreServicesAtRisk
- CoreServicesDegraded
- DashboardingServiceDegraded
- IngressControllerServicesDegraded
- KubernetesControlPlaneAtRisk
- KubernetesControlPlaneDegraded
- LoggingServiceDegraded
- MonitoringServiceAtRisk
- MonitoringServiceDegraded
- NetworkDegraded
- NodeAtRisk
- NodeDegraded
- ObservabilityServicesAtRisk
- ObservabilityServicesDegraded
- PlatformServicesAtRisk
- PlatformServicesDegraded
- SystemPartitionAtRisk
- SystemPartitionDegraded
- VolumeAtRisk
- VolumeDegraded
- Simple Rules
- AlertmanagerClusterCrashlooping
- AlertmanagerClusterDown
- AlertmanagerClusterFailedToSendAlerts
- AlertmanagerClusterFailedToSendAlerts
- AlertmanagerConfigInconsistent
- AlertmanagerFailedReload
- AlertmanagerFailedToSendAlerts
- AlertmanagerMembersInconsistent
- ConfigReloaderSidecarErrors
- CPUThrottlingHigh
- etcdDatabaseQuotaLowSpace
- etcdExcessiveDatabaseGrowth
- etcdGRPCRequestsSlow
- etcdHighCommitDurations
- etcdHighFsyncDurations
- etcdHighFsyncDurations
- etcdHighNumberOfFailedGRPCRequests
- etcdHighNumberOfFailedGRPCRequests
- etcdHighNumberOfFailedProposals
- etcdHighNumberOfLeaderChanges
- etcdInsufficientMembers
- etcdMemberCommunicationSlow
- etcdMembersDown
- etcdNoLeader
- KubeAggregatedAPIDown
- KubeAggregatedAPIErrors
- KubeAPIDown
- KubeAPIErrorBudgetBurn
- KubeAPIErrorBudgetBurn
- KubeAPITerminatedRequests
- KubeClientCertificateExpiration
- KubeClientCertificateExpiration
- KubeClientErrors
- KubeContainerWaiting
- KubeControllerManagerDown
- KubeCPUOvercommit
- KubeCPUQuotaOvercommit
- KubeDaemonSetMisScheduled
- KubeDaemonSetNotScheduled
- KubeDaemonSetRolloutStuck
- KubeDeploymentGenerationMismatch
- KubeDeploymentReplicasMismatch
- KubeDeploymentRolloutStuck
- KubeHpaMaxedOut
- KubeHpaReplicasMismatch
- KubeJobFailed
- KubeJobNotCompleted
- KubeletClientCertificateExpiration
- KubeletClientCertificateExpiration
- KubeletClientCertificateRenewalErrors
- KubeletDown
- KubeletPlegDurationHigh
- KubeletPodStartUpLatencyHigh
- KubeletServerCertificateExpiration
- KubeletServerCertificateExpiration
- KubeletServerCertificateRenewalErrors
- KubeletTooManyPods
- KubeMemoryOvercommit
- KubeMemoryQuotaOvercommit
- KubeNodeNotReady
- KubeNodeReadinessFlapping
- KubeNodeUnreachable
- KubePersistentVolumeErrors
- KubePersistentVolumeFillingUp
- KubePersistentVolumeFillingUp
- KubePersistentVolumeInodesFillingUp
- KubePersistentVolumeInodesFillingUp
- KubePodCrashLooping
- KubePodNotReady
- KubeProxyDown
- KubeQuotaAlmostFull
- KubeQuotaExceeded
- KubeQuotaFullyUsed
- KubeSchedulerDown
- KubeStatefulSetGenerationMismatch
- KubeStatefulSetReplicasMismatch
- KubeStatefulSetUpdateNotRolledOut
- KubeStateMetricsListErrors
- KubeStateMetricsShardingMismatch
- KubeStateMetricsShardsMissing
- KubeStateMetricsWatchErrors
- KubeVersionMismatch
- NodeBondingDegraded
- NodeClockNotSynchronising
- NodeClockSkewDetected
- NodeCPUHighUsage
- NodeDiskIOSaturation
- NodeFileDescriptorLimit
- NodeFileDescriptorLimit
- NodeFilesystemAlmostOutOfFiles
- NodeFilesystemAlmostOutOfFiles
- NodeFilesystemAlmostOutOfSpace
- NodeFilesystemAlmostOutOfSpace
- NodeFilesystemFilesFillingUp
- NodeFilesystemFilesFillingUp
- NodeFilesystemSpaceFillingUp
- NodeFilesystemSpaceFillingUp
- NodeHighNumberConntrackEntriesUsed
- NodeMemoryHighUtilization
- NodeMemoryMajorPagesFaults
- NodeNetworkInterfaceFlapping
- NodeNetworkReceiveErrs
- NodeNetworkTransmitErrs
- NodeRAIDDegraded
- NodeRAIDDiskFailure
- NodeSystemdServiceFailed
- NodeSystemSaturation
- NodeTextFileCollectorScrapeError
- PrometheusBadConfig
- PrometheusDuplicateTimestamps
- PrometheusErrorSendingAlertsToAnyAlertmanager
- PrometheusErrorSendingAlertsToSomeAlertmanagers
- PrometheusHighQueryLoad
- PrometheusLabelLimitHit
- PrometheusMissingRuleEvaluations
- PrometheusNotConnectedToAlertmanagers
- PrometheusNotificationQueueRunningFull
- PrometheusNotIngestingSamples
- PrometheusOperatorListErrors
- PrometheusOperatorNodeLookupErrors
- PrometheusOperatorNotReady
- PrometheusOperatorReconcileErrors
- PrometheusOperatorRejectedResources
- PrometheusOperatorStatusUpdateErrors
- PrometheusOperatorSyncFailed
- PrometheusOperatorWatchErrors
- PrometheusOutOfOrderTimestamps
- PrometheusRemoteStorageFailures
- PrometheusRemoteWriteBehind
- PrometheusRemoteWriteDesiredShards
- PrometheusRuleFailures
- PrometheusScrapeBodySizeLimitHit
- PrometheusScrapeSampleLimitHit
- PrometheusSDRefreshFailure
- PrometheusTargetLimitHit
- PrometheusTargetSyncFailure
- PrometheusTSDBCompactionsFailing
- PrometheusTSDBReloadsFailing
- TargetDown
- Watchdog
- Predefined Alerting Rules
- Prometheus
- Account Administration
- Cluster and Services Configurations
- Default Service Configurations
- Service Configurations Customization
- Workload plane Ingress Controller Configuration Customization
- Alertmanager Configuration Customization
- Grafana Configuration Customization
- Prometheus Configuration Customization
- Dex Configuration Customization
- Loki Configuration Customization
- Fluent-bit Configuration Customization
- Metalk8s UI Configuration Customization
- MetalK8s Shell UI Configuration Customization
- MetalK8s Shell UI Workloadplane Configuration Customization
- Replicas Count Customization
- Volume Management
- Cluster Upgrade
- Cluster Downgrade
- Disaster Recovery
- Solution Deployment
- The Workload Plane Ingress Virtual IPs
- Changing the hostname of a MetalK8s node
- Changing the Control Plane Ingress IP
- Using the
metalk8s-utils
Image - Registry HA
- Listening Processes
- Troubleshooting
- Sosreport