Deploying Airflow on Kubernetes Using ArgoCD and Terraform: Modern GitOps approach

Apache Airflow is a widely used platform for organizing data manipulation workflows in directed acyclic graphs, which can be used to transform data in Data Warehouses or prepare data for machine learning use.


This content originally appeared on HackerNoon and was authored by Mikhail Markov

\

Introduction

Apache Airflow™ is a widely used platform for organizing data manipulation workflows in directed acyclic graphs (DAGs), which can be used to transform data in Data Warehouses or prepare data for machine learning use.

GitOps is a modern approach to continuous delivery and operational management that leverages Git as the single source of truth for infrastructure and application deployment. By using Git repositories to store declarative descriptions of the desired system state, GitOps ensures that the infrastructure is reproducible, auditable, and easy to manage. In this article, I will show you how to manage ArgoCD with GitOps. We will be using a wide range of tools in our implementation.


Prerequisites

Knowledge requirements

  • Basic understanding of Git.
  • Basic understanding of Kubernetes and containerization.
  • Basic understanding of Infrastructure as Code.

Tools and technologies needed

Code repository

All related code is stored in my Github repo: airflow-k8s. Please feel free to fork it for your experiments.


Step 1: Provisioning Kubernetes Cluster

Kubernetes, often abbreviated as K8s, is an open-source platform designed for automating the deployment, scaling, and operation of application containers across clusters of hosts. Originally developed by Google, it is now maintained by the Cloud Native Computing Foundation (CNCF).

For the purposes of this article, I will be using Docker Desktop with Kubernetes mode enabled. You can easily set it up locally following the official guide Deploy on Kubernetes with Docker Desktop. You can also use a local Kubernetes cluster like MicroK8s, Minikube, Kind, etc., or even use Managed Kubernetes services offered by famous cloud providers like EKS, GKE, AKS, etc.

~ kubectl version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

\

Step 2: Installing and Configuring ArgoCD

Argo CD is a declarative GitOps continuous delivery tool for Kubernetes. It is part of the Argo project, which includes other tools for continuous integration and delivery (CI/CD) workflows. Argo CD specifically focuses on deploying applications and managing Kubernetes resources in an automated and declarative way, ensuring that the desired state of the application defined in a Git repository matches the actual state in the Kubernetes cluster.

\ First of all, for deploying ArgoCD with Terraform, you need to clone airflow-k8s repo:

~ git clone https://github.com/xrayid/airflow-k8s

\ Review the terraform configuration:

terraform {
  required_providers {
    helm = {
      source  = "hashicorp/helm"
      version = "2.14.0"
    }
  }
}

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

resource "helm_release" "argocd" {
  name             = "argocd"
  repository       = "https://argoproj.github.io/argo-helm"
  chart            = "argo-cd"
  version          = var.argocd_chart_version
  namespace        = "argocd"
  create_namespace = true
}

variable "argocd_chart_version" {
  description = "ArgoCD Helm chart version"
  type        = string
  default     = "7.3.6"
}

\ Init Terrafrom configuration:

~ terraform init
Initializing HCP Terraform...
Initializing provider plugins...
- Finding hashicorp/helm versions matching "2.14.0"...
- Installing hashicorp/helm v2.14.0...
- Installed hashicorp/helm v2.14.0 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

HCP Terraform has been successfully initialized!

You may now begin working with HCP Terraform. Try running "terraform plan" to
see any changes that are required for your infrastructure.

If you ever set or change modules or Terraform Settings, run "terraform init"
again to reinitialize your working directory.

\ Now, you are ready to deploy ArgoCD on Kubernetes. Run the terraform run command, review the plan, and apply changes.

~ terraform apply

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # helm_release.argocd will be created
  + resource "helm_release" "argocd" {
      + atomic                     = false
      + chart                      = "argo-cd"
      + cleanup_on_fail            = false
      + create_namespace           = true
      + dependency_update          = false
      + disable_crd_hooks          = false
      + disable_openapi_validation = false
      + disable_webhooks           = false
      + force_update               = false
      + id                         = (known after apply)
      + lint                       = false
      + manifest                   = (known after apply)
      + max_history                = 0
      + metadata                   = (known after apply)
      + name                       = "argocd"
      + namespace                  = "argocd"
      + pass_credentials           = false
      + recreate_pods              = false
      + render_subchart_notes      = true
      + replace                    = false
      + repository                 = "https://argoproj.github.io/argo-helm"
      + reset_values               = false
      + reuse_values               = false
      + skip_crds                  = false
      + status                     = "deployed"
      + timeout                    = 300
      + verify                     = false
      + version                    = "7.3.6"
      + wait                       = true
      + wait_for_jobs              = false
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

helm_release.argocd: Creating...
helm_release.argocd: Still creating... [10s elapsed]
helm_release.argocd: Still creating... [20s elapsed]
helm_release.argocd: Still creating... [30s elapsed]
helm_release.argocd: Still creating... [40s elapsed]
helm_release.argocd: Still creating... [50s elapsed]
helm_release.argocd: Still creating... [1m0s elapsed]
helm_release.argocd: Creation complete after 1m9s [id=argocd]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

\ After you complete the installation, get the initial admin password using the following command:

~ kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

\ Forward port to the internal ArgoCD Kubernetes service:

~ kubectl port-forward svc/argocd-server -n argocd 8080:443

\ Now you can log in to the ArgoCD UI https://localhost:8080 with admin username and init password.

ArgoCD has been installed, and we are ready to deploy Airflow.

Step 3: Installing Airflow as an Argo CD application.

You can use manifests to manage Argo CD applications in an IaC manner.

This is the Argo CD application manifest that I have already prepared. You can find it in the related repo: airflow-root-app.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: airflow-root-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/xrayid/airflow-k8s.git
    targetRevision: HEAD
    path: airflow-k8s
  destination:
    server: https://kubernetes.default.svc

\ Using Argo CD UI, create a new application, past the manifest and deploy the Airflow application.

\ Wait about 5 mins for deployment to be finished. And validate the application status in the UI.

Forward port to the internal ArgoCD Kubernetes service:

~ kubectl port-forward svc/airflow-webserver 8081:8080 --namespace argocd

\ Now you can log in to the ArgoCD UI https://localhost:8081 with admin username and admin password.

Conclusion

This article demonstrated IaC and GitOps approaches for deploying and managing Argo CD and Airflow. This is not a production-ready solution, but you can use my code and related documentation as a starting point for improving Apache Airflow management in your environments.

References

\ \


This content originally appeared on HackerNoon and was authored by Mikhail Markov


Print Share Comment Cite Upload Translate Updates
APA

Mikhail Markov | Sciencx (2024-07-30T08:27:26+00:00) Deploying Airflow on Kubernetes Using ArgoCD and Terraform: Modern GitOps approach. Retrieved from https://www.scien.cx/2024/07/30/deploying-airflow-on-kubernetes-using-argocd-and-terraform-modern-gitops-approach/

MLA
" » Deploying Airflow on Kubernetes Using ArgoCD and Terraform: Modern GitOps approach." Mikhail Markov | Sciencx - Tuesday July 30, 2024, https://www.scien.cx/2024/07/30/deploying-airflow-on-kubernetes-using-argocd-and-terraform-modern-gitops-approach/
HARVARD
Mikhail Markov | Sciencx Tuesday July 30, 2024 » Deploying Airflow on Kubernetes Using ArgoCD and Terraform: Modern GitOps approach., viewed ,<https://www.scien.cx/2024/07/30/deploying-airflow-on-kubernetes-using-argocd-and-terraform-modern-gitops-approach/>
VANCOUVER
Mikhail Markov | Sciencx - » Deploying Airflow on Kubernetes Using ArgoCD and Terraform: Modern GitOps approach. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/07/30/deploying-airflow-on-kubernetes-using-argocd-and-terraform-modern-gitops-approach/
CHICAGO
" » Deploying Airflow on Kubernetes Using ArgoCD and Terraform: Modern GitOps approach." Mikhail Markov | Sciencx - Accessed . https://www.scien.cx/2024/07/30/deploying-airflow-on-kubernetes-using-argocd-and-terraform-modern-gitops-approach/
IEEE
" » Deploying Airflow on Kubernetes Using ArgoCD and Terraform: Modern GitOps approach." Mikhail Markov | Sciencx [Online]. Available: https://www.scien.cx/2024/07/30/deploying-airflow-on-kubernetes-using-argocd-and-terraform-modern-gitops-approach/. [Accessed: ]
rf:citation
» Deploying Airflow on Kubernetes Using ArgoCD and Terraform: Modern GitOps approach | Mikhail Markov | Sciencx | https://www.scien.cx/2024/07/30/deploying-airflow-on-kubernetes-using-argocd-and-terraform-modern-gitops-approach/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.