Data engineering is a vital component of any data-driven organization. With the increasing volume and complexity of data, it has become essential to have a powerful and efficient platform to manage and process it. And this is where Databricks comes into play. Databricks is a cloud-based platform that provides data engineers with a unified analytics engine for big data and machine learning. It combines the power of Apache Spark with an easy-to-use interface, making it an indispensable tool in any data engineer's toolbox.
As a data engineer, you want to know how to easily and quickly process large datasets, build and train machine learning models, and perform advanced analytics.
This training is necessary for any data engineer looking to build a robust and scalable data infrastructure.
Training description
This training program is designed to provide participants with the knowledge and skills required for data engineering with Databricks on Azure. The program promises to cover a wide range of topics and is set to equip participants with the practical skills needed to excel in their field. We will explore the most efficient ways to facilitate the Spark engine and distributed computing principles within the Databricks environment. Our main emphasis is on fully utilizing Databricks to address all queries related to data engineering.
Duration & Agenda
This 3-day training covers end-to-end data development through Databricks.
Day 1 focuses on the Spark Core and the Databricks platform:
- What are the Spark architectural components?
- Databricks platform overview
- DataFrame reader, writer, transformation, and aggregation
- What is Lazy Execution
- Basic & complex type
- Spark internals
Day 2 focuses on building Data Lakehouses:
- What are the Spark architectural components?
- Lakehouse architecture vs. traditional data warehouse
- Medaillon structure
- Databricks SQL vs. Python development
- Delta Lake: transaction log, parquet
- Unity catalog
- Security
- Governance
- Lineage
- How does ADF + Databricks + ADLS come together?
Day 3 is an advanced course on optimizations:
- Delta internals
- Optimizations
- What's coming in the future?
- Data processing options
- Streaming vs. batch (incl autoloader)
- Cache
After completing this training, you will have a thorough understanding of basic and advanced optimization techniques and the ability to master data engineering with Databricks on Azure, significantly improving your skills.
Target audience
-
You are an (aspirant) BI professional with knowledge of data modeling & data warehouse development. You know SQL or Python, and you have a notion of dimensional data concepts.
-
Note: For aspirants in BI & data warehousing, we highly recommend following the Dimensional Data Modeling Training before this Databricks training.
-
-
You are looking to know what's what in the Azure Cloud and get some practical tips (rather than reading online documentation)
-
Note: we recommend all participants follow the Azure Fundamentals training before this Databricks training as it gives a broad overview of Azure Cloud, resources & cloud data analytics concepts & key resources.
-
Format
The training consists of plenary lecturing with a hands-on lab environment. The course can be taught in both English and Dutch, also on-site at the customers' premises.
Cost
2.000 € per participant for 3 days
More information or registration
- For more information, contact academy@element61.be
- The training schedule can be found in the Academy Calendar (PDF)
- For a complete overview of all training, visit our Academy page