Outreachy ended and here is my experience in Apache Airflow Open Source

The three months of the Outreachy internship ended, they were fast and I learned a lot of things in this period. I’ve never programmed in Python like I did in Outreachy. My mentor Jarek Potiuk guide me write my first lines of code for Apache Airflow, I…


This content originally appeared on DEV Community and was authored by Edi🦕

The three months of the Outreachy internship ended, they were fast and I learned a lot of things in this period. I've never programmed in Python like I did in Outreachy. My mentor Jarek Potiuk guide me write my first lines of code for Apache Airflow, I am very grateful for his patience.
This is basically what encompasses my learning, in the technical and soft skills side.

Image description
Apache Airflow is an open source workflow management platform for data engineering pipelines by creating workflows as directed acyclic graphs (DAGs) of tasks written in Python. It is considered one of the most robust platforms used by data engineers.

Image description

Apache Airflow has a very complete and complex local development environment written mainly in Bash, it's called Breeze. However, most project contributors are proficient in Python, so the local development environment is not good for maintenance and debugging.

Our goal is to Convert Airflow Local Development environment (Breeze) from Bash-based to Python-based and also make it possible for Windows users to run the Breeze environment.

My mentors detailed each task in the backlog using GitHub Projects. Each task was described using problems. Show Out to my mentors: Jarek Potiuk, Elad Kalif, Nasser Kaze for giving me as much detail to get started.

Image description

To start working on a feature, we create a separate branch, we always “rebase”, to place our changes at the end of the latest updates that were in the original repository.

 git remote add apache git@github.com:apache/airflow.git
 git fetch apache
 git fetch --all
 # This will print the HASH of the base commit which you 
 # should use to rebase your feature from
 git merge-base my-branch apache/main
 git rebase HASH --onto apache/main
 git push --force-with-lease

One of the biggest challenges for me was understanding and knowing how to resolve conflicts with Git, and rebasing was one of the techniques that worked quite well for me.

Image description by Ryan Hodson

I analyzed the bash script code and asked for feedback from my mentor on how to approach it. Before I can rewrite it in Python, there are things that can definitely be done differently because of the flexibility of Python, is what we discussed.

Image description

Before pushing, we use pre-commits, where the minimum standards for an acceptable commit are defined.

pre-commit install
pre-commit --version
pre-commit uninstall

After finishing the implementation, we can create a Pull Request, if it is not ready, we can set it as a DRAFT to indicate that it is Work in Progress and that it is not yet ready for review.
Image description By Luke Hefson

Once our Pull Request is created, integration tests flow in GitHub Actions is executed, this validates many things, depending on the content that is modified. Example:
If the change is related to documentation, validation is quite simple and awaits the approval of our mentor, without running tests.
If the core of the product is changed, it runs all the necessary tests with different versions of python.

Once the jobs that test our changes finish successfully, our Pr is approved.

Among the Python libraries I used in this period are:

  • Psutil
    Library to retrieve information on running processes and system utilization (CPU, memory, disks, network, sensors) in Python.

  • Python Click
    Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary.

  • Python Subprocess
    module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

  • Data Class
    Data classes are just regular classes that are geared towards storing state, rather than containing a lot of logic.

Some of the things I am exploring now are these:
Docker, Kubernetes, Helm Chart, GHCR.io

Among the soft skills that I improved and learned in the internship were these:
Image description


This content originally appeared on DEV Community and was authored by Edi🦕


Print Share Comment Cite Upload Translate Updates
APA

Edi🦕 | Sciencx (2022-03-27T18:00:50+00:00) Outreachy ended and here is my experience in Apache Airflow Open Source. Retrieved from https://www.scien.cx/2022/03/27/outreachy-ended-and-here-is-my-experience-in-apache-airflow-open-source/

MLA
" » Outreachy ended and here is my experience in Apache Airflow Open Source." Edi🦕 | Sciencx - Sunday March 27, 2022, https://www.scien.cx/2022/03/27/outreachy-ended-and-here-is-my-experience-in-apache-airflow-open-source/
HARVARD
Edi🦕 | Sciencx Sunday March 27, 2022 » Outreachy ended and here is my experience in Apache Airflow Open Source., viewed ,<https://www.scien.cx/2022/03/27/outreachy-ended-and-here-is-my-experience-in-apache-airflow-open-source/>
VANCOUVER
Edi🦕 | Sciencx - » Outreachy ended and here is my experience in Apache Airflow Open Source. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/03/27/outreachy-ended-and-here-is-my-experience-in-apache-airflow-open-source/
CHICAGO
" » Outreachy ended and here is my experience in Apache Airflow Open Source." Edi🦕 | Sciencx - Accessed . https://www.scien.cx/2022/03/27/outreachy-ended-and-here-is-my-experience-in-apache-airflow-open-source/
IEEE
" » Outreachy ended and here is my experience in Apache Airflow Open Source." Edi🦕 | Sciencx [Online]. Available: https://www.scien.cx/2022/03/27/outreachy-ended-and-here-is-my-experience-in-apache-airflow-open-source/. [Accessed: ]
rf:citation
» Outreachy ended and here is my experience in Apache Airflow Open Source | Edi🦕 | Sciencx | https://www.scien.cx/2022/03/27/outreachy-ended-and-here-is-my-experience-in-apache-airflow-open-source/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.