When you want to monitor multiple Kubernetes with Prometheus, the viewer often wants to consolidate them in one place.
With Federation, as shown in the following figure, each Prometheus collects data of its own cluster, while aggregate Prometheus collects data of all clusters will be stored, so there will be a lot of waste in terms of data size.
As shown in the figure below, Prometheus on the aggregating side accepts remote-write, and by starting other Prometheus in agent mode, you can monitor multiple Prometheus by sending data to the aggregating side without accumulating it on each cluster. I think Federation has its merits, but when monitoring multiple clusters, this configuration is often desired.
Build this configuration using Tanzu Kubernetes Grid packages.
Set up Aggregate Prometheus
First, set up Aggregate Prometheus in a shared Kubernetes cluster.
It is assumed that the following packages are already installed
ℹ️ If you want to try using kubernetes other than TKG, please refer to this article
Create the following YAML. The point is to add --web.enable-remote-write-receiver
to the Prometheus startup options.
I have added the label cluster
to distinguish which cluster each metric comes from. In the following configuration file, it is called orbstack
, so please replace it according to your environment.
Also, change ingress.virtual_host_fqdn
to the value of your environment.
cat <<'EOF' > prometheus-values.yaml
---
ingress:
enabled: true
virtual_host_fqdn: prometheus.192-168-194-146.sslip.io
prometheus:
deployment:
containers:
args:
- --storage.tsdb.retention.time=42d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
- --enable-feature=exemplar-storage
- --web.enable-remote-write-receiver
config:
prometheus_yml: |
global:
evaluation_interval: 1m
scrape_interval: 1m
scrape_timeout: 10s
rule_files:
- /etc/config/alerting_rules.yml
- /etc/config/recording_rules.yml
- /etc/config/alerts
- /etc/config/rules
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: [ 'localhost:9090' ]
labels:
cluster: orbstack
- job_name: 'kube-state-metrics'
static_configs:
- targets: [ 'prometheus-kube-state-metrics.tanzu-system-monitoring.svc.cluster.local:8080' ]
labels:
cluster: orbstack
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_scrape ]
action: keep
regex: true
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_path ]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [ __address__, __meta_kubernetes_pod_annotation_prometheus_io_port ]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [ __meta_kubernetes_namespace ]
action: replace
target_label: kubernetes_namespace
- source_labels: [ __meta_kubernetes_pod_name ]
action: replace
target_label: kubernetes_pod_name
- source_labels: [ ]
target_label: cluster
replacement: orbstack
- job_name: kubernetes-nodes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubernetes.default.svc:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
- source_labels: [ ]
target_label: cluster
replacement: orbstack
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
- source_labels: [ ]
target_label: cluster
replacement: orbstack
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- alertmanager.tanzu-system-monitoring.svc:80
- kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [ __meta_kubernetes_namespace ]
regex: default
action: keep
- source_labels: [ __meta_kubernetes_pod_label_app ]
regex: prometheus
action: keep
- source_labels: [ __meta_kubernetes_pod_label_component ]
regex: alertmanager
action: keep
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_probe ]
regex: .*
action: keep
- source_labels: [ __meta_kubernetes_pod_container_port_number ]
regex:
action: drop
EOF
Install it with the following command:
tanzu package install -n tkg-system -p prometheus.tanzu.vmware.com -v 2.43.0+vmware.2-tkg.1 --values-file prometheus-values.yaml prometheus
$ kubectl get pod,httpproxy -n tanzu-system-monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/prometheus-node-exporter-2qjmj 1/1 Running 0 55s 192.168.194.48 orbstack <none> <none>
pod/prometheus-kube-state-metrics-7d4b54447c-5m7vs 1/1 Running 0 53s 192.168.194.50 orbstack <none> <none>
pod/prometheus-pushgateway-5f6d8695d4-w4f5w 1/1 Running 0 53s 192.168.194.52 orbstack <none> <none>
pod/alertmanager-85796b8bc9-6dlbk 1/1 Running 0 54s 192.168.194.53 orbstack <none> <none>
pod/prometheus-server-5f784b75fd-kpf4d 2/2 Running 0 53s 192.168.194.54 orbstack <none> <none>
NAME FQDN TLS SECRET STATUS STATUS DESCRIPTION
httpproxy.projectcontour.io/prometheus-httpproxy prometheus.192-168-194-146.sslip.io prometheus-tls valid Valid HTTPProxy
When you access the URL specified in ingress.virtual_host_fqdn
, the Prometheus UI will be displayed.
Make sure the --web.enable-remote-write-receiver
flag is enabled.
Make sure each metric has the cluster
label.
Set up Prometheus in agent mode
Next, set up Prometheus in agent mode on each Kubernetes side. There are no prerequisite packages.
Create the following YAML. The point is to add --enable-feature=agent
to the Prometheus startup options.
Also set the Aggregate Prometheus URL in remote_write
in prometheus.yaml
. You can configure labels to be added when remote writing by global.external_labels
.
Here I set kind
to the cluster
label as the cluster name. Please modify it according to your own environment. In case of agent mode, you can not set alert in prometheus.yaml
, so delete the setting.
PVC is not required, but if it is a Tanzu Package, it cannot be disabled in the values file, so set a tiny size. Similarly, AlertManager and Push Gateway are not necessary, but since they cannot be disabled in the values file, set the replica number to 0.
cat <<'EOF' > prometheus-agent-values.yaml
---
prometheus:
deployment:
containers:
args:
- --config.file=/etc/config/prometheus.yml
- --enable-feature=exemplar-storage
- --enable-feature=agent
- --storage.agent.retention.max-time=5m
pvc:
storage: 1Gi
config:
prometheus_yml: |
global:
evaluation_interval: 1m
scrape_interval: 1m
scrape_timeout: 10s
external_labels:
cluster: kind
remote_write:
- url: https://prometheus.192-168-194-146.sslip.io/api/v1/write
send_exemplars: true
tls_config:
insecure_skip_verify: true
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: [ 'localhost:9090' ]
- job_name: 'kube-state-metrics'
static_configs:
- targets: [ 'prometheus-kube-state-metrics.tanzu-system-monitoring.svc.cluster.local:8080' ]
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_scrape ]
action: keep
regex: true
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_path ]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [ __address__, __meta_kubernetes_pod_annotation_prometheus_io_port ]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [ __meta_kubernetes_namespace ]
action: replace
target_label: kubernetes_namespace
- source_labels: [ __meta_kubernetes_pod_name ]
action: replace
target_label: kubernetes_pod_name
- job_name: kubernetes-nodes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubernetes.default.svc:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
alertmanager:
deployment:
replicas: 0
pushgateway:
deployment:
replicas: 0
EOF
Install it with the following command:
tanzu package install -n tkg-system -p prometheus.tanzu.vmware.com -v 2.43.0+vmware.2-tkg.1 --values-file prometheus-agent-values.yaml prometheus
$ kubectl get pod -n tanzu-system-monitoring
NAME READY STATUS RESTARTS AGE
prometheus-kube-state-metrics-c7f4d6f6f-j4vfj 1/1 Running 0 4m34s
prometheus-node-exporter-xzw7d 1/1 Running 0 4m34s
prometheus-server-6b98c76645-zz69b 2/2 Running 0 3m18s
Let's port forward the prometheus UI with the following command:
kubectl port-forward -n tanzu-system-monitoring svc/prometheus-server 9090:8
When you access the UI, "Prometheus Agent" is displayed, so you can see that it is running in agent mode.
Get sum(kube_pod_info) by (cluster, namespace)
in the aggregate Prometheus UI. If the number of pods per namespace is output for each cluster as shown below, the setup is successful.
The Grafana dashboard also needs to be customized so that it can be searched by the cluster
label, but that is out of the scope of this blog post.
If you want to monitor a multi-cluster topology of Tanzu Application Platform with Prometheus, this configuration would be preferred.