Cluster MonitoringΒΆ
This section covers the MetalK8s monitoring and alerting stack operations. It also describes the metrics monitored using Prometheus, with the list of pre-configured alerting and recording rules.
- Monitoring Stack
- Alerts
- Predefined Alerting Rules
- Hierarchy
- Composite Rules
- AccessServicesDegraded
- AlertingServiceAtRisk
- AlertingServiceDegraded
- AuthenticationServiceDegraded
- BootstrapServicesDegraded
- ClusterAtRisk
- ClusterDegraded
- CoreServicesAtRisk
- CoreServicesDegraded
- DashboardingServiceDegraded
- IngressControllerServicesDegraded
- KubernetesControlPlaneAtRisk
- KubernetesControlPlaneDegraded
- LoggingServiceDegraded
- MonitoringServiceAtRisk
- MonitoringServiceDegraded
- NetworkDegraded
- NodeAtRisk
- NodeDegraded
- ObservabilityServicesAtRisk
- ObservabilityServicesDegraded
- PlatformServicesAtRisk
- PlatformServicesDegraded
- SystemPartitionAtRisk
- SystemPartitionDegraded
- VolumeAtRisk
- VolumeDegraded
- Simple Rules
- AlertmanagerClusterCrashlooping
- AlertmanagerClusterDown
- AlertmanagerClusterFailedToSendAlerts
- AlertmanagerClusterFailedToSendAlerts
- AlertmanagerConfigInconsistent
- AlertmanagerFailedReload
- AlertmanagerFailedToSendAlerts
- AlertmanagerMembersInconsistent
- ConfigReloaderSidecarErrors
- CPUThrottlingHigh
- etcdDatabaseQuotaLowSpace
- etcdExcessiveDatabaseGrowth
- etcdGRPCRequestsSlow
- etcdHighCommitDurations
- etcdHighFsyncDurations
- etcdHighFsyncDurations
- etcdHighNumberOfFailedGRPCRequests
- etcdHighNumberOfFailedGRPCRequests
- etcdHighNumberOfFailedProposals
- etcdHighNumberOfLeaderChanges
- etcdInsufficientMembers
- etcdMemberCommunicationSlow
- etcdMembersDown
- etcdNoLeader
- KubeAggregatedAPIDown
- KubeAggregatedAPIErrors
- KubeAPIDown
- KubeAPIErrorBudgetBurn
- KubeAPIErrorBudgetBurn
- KubeAPITerminatedRequests
- KubeClientCertificateExpiration
- KubeClientCertificateExpiration
- KubeClientErrors
- KubeContainerWaiting
- KubeControllerManagerDown
- KubeCPUOvercommit
- KubeCPUQuotaOvercommit
- KubeDaemonSetMisScheduled
- KubeDaemonSetNotScheduled
- KubeDaemonSetRolloutStuck
- KubeDeploymentGenerationMismatch
- KubeDeploymentReplicasMismatch
- KubeDeploymentRolloutStuck
- KubeHpaMaxedOut
- KubeHpaReplicasMismatch
- KubeJobFailed
- KubeJobNotCompleted
- KubeletClientCertificateExpiration
- KubeletClientCertificateExpiration
- KubeletClientCertificateRenewalErrors
- KubeletDown
- KubeletPlegDurationHigh
- KubeletPodStartUpLatencyHigh
- KubeletServerCertificateExpiration
- KubeletServerCertificateExpiration
- KubeletServerCertificateRenewalErrors
- KubeletTooManyPods
- KubeMemoryOvercommit
- KubeMemoryQuotaOvercommit
- KubeNodeNotReady
- KubeNodeReadinessFlapping
- KubeNodeUnreachable
- KubePersistentVolumeErrors
- KubePersistentVolumeFillingUp
- KubePersistentVolumeFillingUp
- KubePersistentVolumeInodesFillingUp
- KubePersistentVolumeInodesFillingUp
- KubePodCrashLooping
- KubePodNotReady
- KubeProxyDown
- KubeQuotaAlmostFull
- KubeQuotaExceeded
- KubeQuotaFullyUsed
- KubeSchedulerDown
- KubeStatefulSetGenerationMismatch
- KubeStatefulSetReplicasMismatch
- KubeStatefulSetUpdateNotRolledOut
- KubeStateMetricsListErrors
- KubeStateMetricsShardingMismatch
- KubeStateMetricsShardsMissing
- KubeStateMetricsWatchErrors
- KubeVersionMismatch
- NodeBondingDegraded
- NodeClockNotSynchronising
- NodeClockSkewDetected
- NodeCPUHighUsage
- NodeDiskIOSaturation
- NodeFileDescriptorLimit
- NodeFileDescriptorLimit
- NodeFilesystemAlmostOutOfFiles
- NodeFilesystemAlmostOutOfFiles
- NodeFilesystemAlmostOutOfSpace
- NodeFilesystemAlmostOutOfSpace
- NodeFilesystemFilesFillingUp
- NodeFilesystemFilesFillingUp
- NodeFilesystemSpaceFillingUp
- NodeFilesystemSpaceFillingUp
- NodeHighNumberConntrackEntriesUsed
- NodeMemoryHighUtilization
- NodeMemoryMajorPagesFaults
- NodeNetworkInterfaceFlapping
- NodeNetworkReceiveErrs
- NodeNetworkTransmitErrs
- NodeRAIDDegraded
- NodeRAIDDiskFailure
- NodeSystemdServiceFailed
- NodeSystemSaturation
- NodeTextFileCollectorScrapeError
- PrometheusBadConfig
- PrometheusDuplicateTimestamps
- PrometheusErrorSendingAlertsToAnyAlertmanager
- PrometheusErrorSendingAlertsToSomeAlertmanagers
- PrometheusHighQueryLoad
- PrometheusLabelLimitHit
- PrometheusMissingRuleEvaluations
- PrometheusNotConnectedToAlertmanagers
- PrometheusNotificationQueueRunningFull
- PrometheusNotIngestingSamples
- PrometheusOperatorListErrors
- PrometheusOperatorNodeLookupErrors
- PrometheusOperatorNotReady
- PrometheusOperatorReconcileErrors
- PrometheusOperatorRejectedResources
- PrometheusOperatorStatusUpdateErrors
- PrometheusOperatorSyncFailed
- PrometheusOperatorWatchErrors
- PrometheusOutOfOrderTimestamps
- PrometheusRemoteStorageFailures
- PrometheusRemoteWriteBehind
- PrometheusRemoteWriteDesiredShards
- PrometheusRuleFailures
- PrometheusScrapeBodySizeLimitHit
- PrometheusScrapeSampleLimitHit
- PrometheusSDRefreshFailure
- PrometheusTargetLimitHit
- PrometheusTargetSyncFailure
- PrometheusTSDBCompactionsFailing
- PrometheusTSDBReloadsFailing
- TargetDown
- Watchdog
- Predefined Alerting Rules
- Prometheus