This content originally appeared on DEV Community and was authored by Helen Anderson
There are just a few days left until Australia's ultimate data engineering conference.
It's going to be another big year with more talks than ever. There is something for everyone, whether you work in a start-up, an established company, or you are just learning the art and science of data engineering.
Talks are aligned to our four themes:
- Data at rest - data warehousing, data lakes, and data storage.
- Data in motion - event driven architecture, data pipelines, and streaming.
- Data for machine learning - data pipelines for machine learning, productionising models, and managing data artefacts.
- Data you trust - data management, lineage, testing, and security.
Here are just some of the talks I can't wait to check out during the three-day online conference.
Data Quality with Great Expectations and Airflow in a Reverse-ETL World
Data-driven companies are asking their analytics teams to expose information in the data warehouse to third-party applications. As analytics workflows become increasingly dependent on downstream processes, data quality testing becomes more critical. In this talk, we will see how to leverage the Great Expectations library within an Airflow workflow.
Trust, Knowledge and your Data: Our approach at KADA to building a great data product
You have spent hours building great data products that failed to gain traction? Did you find users using legacy reports when a better report was available? You're not alone.
KADA has identified five factors that make a great, trusted data product. This talk will provide examples of how you can improve your data products to make trust and knowledge part of the modern data stack.
From experiment to production - a journey of a machine learning model
This presentation discusses how to put a machine learning model into production, as well as the infrastructure to support it.
A natural language processing model will be used as an example of how a model travels through experiment tracking, data artifacts management, data / machine learning pipelines, and into production.
Data Engineers: Privacy is our problem
Protecting personally identifiable information and ensuring it is used in an ethical manner go hand in hand when collecting data.
Immuta's director of data and analytics, Stephen Bailey PhD, will discuss the need for data engineers to take on the responsibility of data privacy.
Gone Streaming: dbt+Materialize in 10 minutes
While dbt excels at batch processing, it can only approximate real-time data transformations.
Once you run a dbt model on top of Materialize, you never have to run it again! Regardless of how often or how much data arrives, your model will always be updated. No matter when you query your view, it will always return a fresh answer. Excited? Skeptical? Cautiously optimistic? Join us to see it for yourself as we walk you through a demo!
Data quality: the key to long term happiness
The advent of modern data warehousing has leveled the playing field and turned the focus from data volume to data quality.
The focus of this presentation will be to review approaches to quantifying data quality at various stages of the collection and processing lifecycle and present tools that can be implemented to help reduce the incidence of erroneous data.
Logging Apache Spark - How we made it easy
When you are running Spark on EMR, how can you improve your log visibility? Rather than ssh and search log files in your servers, this architecture is the perfect solution for you.
A Single Data Platform for All of Your Workloads
Many organisations struggle to balance the competing needs of the business, data scientists, data engineers, data analysts, risk and security experts, and the finance department.
It can be difficult to keep everyone happy by maintaining multiple platforms. Discover how Snowflake simplifies data at rest, data in motion, and data science workloads on a single, secure data platform.
Shift-left testing : Building reliable Data Pipelines
Unreliable data pipelines can lead to data downtime. During "data downtime," your data may be partial, erroneous, or otherwise incorrect.
Organisations that rely on data may have to pay a heavy price for low trust data. During this presentation, we'll explore ways to ensure that our system catches unexpected data and easily recovers from it.
Intelligent Serverless and Scalable Real-Time Data Pipeline using Kinesis, Fargate and CFN
In this session, we will discuss a real-world case study of a real-time data pipeline which generates intelligent data in real-time. This pipeline was developed for a large-scale digital media company. Based on the AWS serverless approach, the solution is highly scalable while following best practices in architecture.
Don't miss out on hearing from these speakers and many more at this years conference. We'll be coming to you online from the 5th - 7th of October so check out the full schedule of sessions and get your free ticket. See you soon!
This content originally appeared on DEV Community and was authored by Helen Anderson
Helen Anderson | Sciencx (2021-09-29T07:54:12+00:00) 10 DataEngBytes talks you won’t want to miss. Retrieved from https://www.scien.cx/2021/09/29/10-dataengbytes-talks-you-wont-want-to-miss/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.