This content originally appeared on DEV Community and was authored by Ramses Alexander Coraspe
Hi everyone,
I am a little bit obsessed with data engineering and lately I have been working on several open source projects about this topic, here is a list of repositories and technologies used in each one, if you decide to go deeper into this funny world then these repositories could help you as a guide.
❤ means "I like this one"
❤ Tracking your Uber Rides and Uber Eats expenses through a data engineering process
Technologies and skills:
Python, Docker, Apache Airflow, AWS Redshift, Power BI, data modelling, Task schedulling, ETL and ELT processes, Data warehousing, Cloud
❤ Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Technologies and skills:
Python, Docker, Big Data, Cloud, BigQuery, Workflow Engines, GCP, Task scheduler, Google Cloud Platform, Dataproc cluster, GCS, Google Cloud Storage, Redis, DAG, Parallel Processing, Apache Spark
❤ Building Big Data Pipelines in the Cloud with AWS EMR
Technologies and skills:
Python, PySpark, AWS EMR, Task Schedulling, IAC, EC2 Instances, Apache Spark, Cloud
❤ Building a Lossless Data Compression and Data Decompression Pipeline
Technologies and skills:
Python, Data compression, BZIP2, Parallel programming
Learn how to dockerize an Apache Spark Standalone Cluster
Technologies and skills:
Python, Jupyter Notebook, Apache Spark, Docker, docker-compose, Hive
❤ Dockerizing and Consuming an Apache Livy environment
Technologies and skills:
Python, Big Data, Docker, docker-compose, Apache Livy, Apache Spark, PostgreSQL, PySpark, Jupyter Notebook
❤ Design, Development and Deployment of a simple Data Pipeline
Technologies and skills:
Python, data Modelling, Docker, docker-compose, PostgreSQL, data pipeline, FastApi
Dockerizing a Python Script for Faster Web Scraping
Technologies and skills:
Python, Docker, Sqlite, Dockerfile, Web scraping, Data pipeline, FastApi
Understanding Similarity Measures for Text Analysis
Technologies and skills:
Python, Machine Learning, Similarity measures, Distance metrics, Text Analysis
❤ Learn how to build a content-based Movie Recommender System
Technologies and skills:
Python, Machine Learning, TF-IDF, Cosine similarity, BM25, BERT, NLP, word2vec, Text Analysis, recsys
A Text Analysis of Speeches
Technologies and skills:
Python, Machine Learning, NLP, word2vec, Text Analysis, Sentiment Analysis, PCA, t-SNE, Word Embeddings, Text Preprocessing, Web scraping, Data Visualization, Mexico
❤ Dropout Students Prediction
Technologies and skills:
R, Genetic algorithm, Neural Networks, K-Means, Clustering, Machine Learning
I will be working on more complex projects in the next months using modern tech data stacks.
This content originally appeared on DEV Community and was authored by Ramses Alexander Coraspe

Ramses Alexander Coraspe | Sciencx (2022-06-15T23:40:58+00:00) Data Engineering Projects for Beginners. Retrieved from https://www.scien.cx/2022/06/15/data-engineering-projects-for-beginners/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.