This content originally appeared on DEV Community and was authored by Hamdi KHELIL
As businesses scale their Kubernetes workloads across multiple Azure Kubernetes Service (AKS) clusters, managing and optimizing cloud costs becomes critical. Deploying and managing observability tools such as KubeCost and OpenTelemetry (OTel) across multiple clusters can be simplified using AKS Fleet Manager, Microsoft Managed Prometheus, and Grafana.
This guide will explain how to deploy KubeCost and OpenTelemetry to multiple AKS clusters using AKS Fleet Manager, expose metrics through OpenTelemetry, and centralize monitoring via Managed Prometheus and Grafana. This setup provides a single view into your multi-cluster environment, allowing for more efficient resource utilization and cost management.
What is KubeCost? 🧮
KubeCost is an open-source cost management tool designed to give you real-time visibility into cloud expenses in Kubernetes environments. It helps identify costs at granular levels—such as namespaces, deployments, services, and pods—allowing organizations to optimize resource usage and reduce expenses.
Why Use AKS Fleet Manager? 🌐
AKS Fleet Manager simplifies managing multiple AKS clusters by centralizing governance, policies, and monitoring across your fleet. Instead of manually managing each cluster, AKS Fleet Manager allows you to orchestrate deployments (like KubeCost and OpenTelemetry) across multiple clusters simultaneously.
Why Use OpenTelemetry, Managed Prometheus, and Grafana? 📊
- OpenTelemetry (OTel): Provides standardized observability, collecting metrics, logs, and traces from Kubernetes workloads and exposing them to monitoring systems like Prometheus.
- Microsoft Managed Prometheus: Fully managed Prometheus service that removes the need to handle Prometheus infrastructure, making it easy to monitor metrics across your clusters.
- Grafana: A powerful visualization tool that integrates with Prometheus to present monitoring metrics in flexible, customizable dashboards.
Deploying these tools across multiple clusters using AKS Fleet Manager allows you to centralize your monitoring and cost optimization across all AKS environments.
Step-by-Step: Deploy KubeCost and OpenTelemetry with AKS Fleet Manager 🔧
Step 1: Create and Register AKS Clusters
First, create the AKS clusters and register them with AKS Fleet Manager. This will allow you to manage multiple clusters as part of a fleet.
Create AKS Clusters
You can create your AKS clusters using the Azure CLI:
# Create a resource group for AKS Fleet Manager
az group create --name myFleetResourceGroup --location eastus
# Create two AKS clusters in different regions
az aks create --resource-group myResourceGroup1 --name myAKSCluster1 --node-count 3 --enable-managed-identity --generate-ssh-keys
az aks create --resource-group myResourceGroup2 --name myAKSCluster2 --node-count 3 --enable-managed-identity --generate-ssh-keys
Register Clusters with AKS Fleet Manager
Once your clusters are ready, you can register them with AKS Fleet Manager:
# Register the first AKS cluster with Fleet Manager
az fleet manager register --resource-group myFleetResourceGroup --name myFleetManager \
--cluster-id /subscriptions/{subscription-id}/resourceGroups/myResourceGroup1/providers/Microsoft.ContainerService/managedClusters/myAKSCluster1
# Register the second AKS cluster
az fleet manager register --resource-group myFleetResourceGroup --name myFleetManager \
--cluster-id /subscriptions/{subscription-id}/resourceGroups/myResourceGroup2/providers/Microsoft.ContainerService/managedClusters/myAKSCluster2
Note: Replace
{subscription-id}
with your actual subscription ID.
Step 2: Create AKS Fleet Manager Workload Template for KubeCost and OpenTelemetry
AKS Fleet Manager allows you to define templates to deploy workloads to multiple clusters simultaneously. Here, we’ll create a workload template for KubeCost and OpenTelemetry.
Define Workload Template for KubeCost
Create a YAML template to deploy KubeCost via Helm:
# kubecost-template.yaml
apiVersion: fleet.azure.com/v1
kind: WorkloadTemplate
metadata:
name: kubecost-template
spec:
release:
chart: kubecost/cost-analyzer
namespace: kubecost
version: 1.90.1
values:
kubecostProductConfigs:
prometheus: true
networkPolicy:
enabled: true
This template deploys KubeCost into the kubecost
namespace on all clusters managed by AKS Fleet Manager.
Define Workload Template for OpenTelemetry Collector
Similarly, define a YAML template for deploying the OpenTelemetry collector in each AKS cluster:
# otel-collector-template.yaml
apiVersion: fleet.azure.com/v1
kind: WorkloadTemplate
metadata:
name: otel-collector-template
spec:
release:
chart: open-telemetry/opentelemetry-collector
namespace: otel-collector
version: 0.31.0
values:
config:
receivers:
prometheus:
scrape_configs:
- job_name: 'kubecost'
metrics_path: /metrics
static_configs:
- targets: ['kubecost-cost-analyzer.kubecost.svc.cluster.local:9090']
exporters:
otlp:
endpoint: "managed-prometheus-endpoint:4317"
tls:
insecure: true
service:
pipelines:
metrics:
receivers: [prometheus]
exporters: [otlp]
This OpenTelemetry configuration scrapes metrics from KubeCost and exports them to Managed Prometheus.
Step 3: Deploy Workload Templates via AKS Fleet Manager
With both workload templates defined, you can deploy them across your clusters using AKS Fleet Manager.
Deploy KubeCost Template
To deploy the KubeCost workload template to all clusters in your fleet:
az fleet workload create --resource-group myFleetResourceGroup --fleet-name myFleetManager \
--template kubecost-template.yaml
This command deploys KubeCost to all AKS clusters managed by the AKS Fleet Manager.
Deploy OpenTelemetry Template
Next, deploy the OpenTelemetry workload template to all clusters:
az fleet workload create --resource-group myFleetResourceGroup --fleet-name myFleetManager \
--template otel-collector-template.yaml
This command deploys the OpenTelemetry collector to all clusters, configuring it to collect KubeCost metrics and forward them to Managed Prometheus.
Step 4: Configure Managed Prometheus
With OpenTelemetry collectors set up in each cluster, Managed Prometheus will now receive metrics from all the clusters.
Enable Microsoft Managed Prometheus
In the Azure portal:
- Navigate to Monitoring > Metrics in each AKS cluster.
- Enable Managed Prometheus.
This allows Managed Prometheus to begin receiving metrics via the OpenTelemetry OTLP exporter from each cluster.
Step 5: Set Up Centralized Visualization in Managed Grafana
To visualize the metrics and costs across your AKS fleet, configure Managed Grafana to connect to Managed Prometheus.
Create a Managed Grafana Workspace
In the Azure portal:
- Create a Managed Grafana workspace by navigating to Azure Managed Grafana.
- Follow the prompts to set up the workspace.
Add Prometheus Data Source to Grafana
Once the Managed Grafana workspace is created:
- Open the Grafana instance.
- Go to Configuration > Data Sources.
- Add Prometheus as a data source and provide the Managed Prometheus endpoint URL.
Import KubeCost Dashboards
Import the pre-built KubeCost dashboards into Grafana for cost visibility:
- Go to Dashboards > Manage > Import.
- Use KubeCost’s provided dashboard IDs or JSON files to import the dashboards.
This setup allows you to monitor KubeCost metrics (such as cost allocation by namespace, deployment, or pod) across all clusters from a single Grafana instance.
Step 6: Monitor and Analyze Costs Across Clusters 📈
Once everything is configured, you can start monitoring and analyzing your Kubernetes costs across multiple clusters using Grafana.
- Cost Allocation by Namespace: Create a dashboard to show cost breakdown by namespace across clusters using the following query:
sum(kubecost_allocation{label_namespace!=""}) by (label_namespace)
- Cross-Cluster Cost Efficiency by Deployment: Create a dashboard to track cost efficiency by deployment across your clusters with this query:
sum(rate(kubecost_allocation{label_deployment!=""}[5m])) by (label_deployment, cluster) / sum(rate(container_cpu_usage_seconds_total{job="kubelet", container!="POD"}[5m])) by (pod, namespace, cluster)
This centralized view allows you to optimize resource usage and identify potential inefficiencies across multiple AKS clusters.
Step 7: Implement Best Practices for Security and Performance 🔒
- Secure OpenTelemetry Communication: Use TLS encryption for communication between OpenTelemetry collectors and Managed Prometheus.
- Limit Network Access: Ensure Managed Prometheus and Grafana endpoints are only accessible to authorized users and systems.
- Set Resource Limits: Define resource limits for OpenTelemetry collectors to prevent them from consuming excessive resources on your AKS clusters.
Conclusion 🎯
By deploying KubeCost and OpenTelemetry via AKS Fleet Manager and centralizing monitoring with Managed Prometheus and Grafana, you can streamline cost management and observability
across multiple AKS clusters. This setup provides a unified view of costs, enabling you to make data-driven decisions for optimizing resource usage and reducing cloud spend.
With AKS Fleet Manager handling the deployment and orchestration of workloads across multiple clusters, this approach simplifies management and ensures consistency across environments. Implement this multi-cluster monitoring solution today and gain complete control over your Kubernetes spending! 🌟
Happy clustering! 📉
This content originally appeared on DEV Community and was authored by Hamdi KHELIL

Hamdi KHELIL | Sciencx (2024-09-12T07:42:06+00:00) Monitor and Optimize Multi-Cluster AKS Costs 💰. Retrieved from https://www.scien.cx/2024/09/12/monitor-and-optimize-multi-cluster-aks-costs-%f0%9f%92%b0/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.