06-30-2024, 09:59 AM
Basics To Advanced: Azure Synapse Analytics Hands-On Project
Published 8/2023
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 6.86 GB | Duration: 18h 40m
Build complete project only with Azure Synapse Analytics focused on PySpark includes delta lake and spark Optimizations
[b]What you'll learn[/b]
Understand Azure Synapse Analytics Services Practically
Complete basic to advanced understanding on Azure Synapse Analytics
Gain hands-on experience in applying Spark optimization techniques to real-world scenarios, achieving faster insights.
Understand 50+ most commonly used PySpark Transformations
Acquire a comprehensive library of 45+ PySpark notebooks for data cleansing, enrichment, and transformation.
Hands-on learning on building a modern data warehouse using Azure Synapse
Explore the capabilities of Spark Pools and their role in processing large-scale data workloads
Understand how python is used in Data Engineering
Understand and transform data with Serverless SQL pool
Understand the principles and advantages of Delta Lake as a reliable data storage and management solution.
Explore the capabilities of Spark Pools and their role in processing large-scale data workloads
Learn How Spark is evolved and its growth
Provides insights on services that needed to clear DP-203
Create and configure a Serverless SQL pool
Create External DataSource, External Files, External Tables in Serverless SQL pool
Configure Spark Pools and understand the working of them
Explore the capabilities of Spark Pools and their role in processing large-scale data workloads
Understand the Integration of Power BI with Azure Synapse Analytics
Explore the capabilities of Spark Pools and their role in processing large-scale data workloads
Create and work with Dedicated SQL pool on a high level
Optimize your PySpark with Spark Optimization techniques
Learn history and data processing before Spark
Implement the incremental UPSERT using Delta Lake
Understand and implement versioning in delta lake
Implement MSSpark Utils and the uses of its utilities
How we can mount Data lake to Synapse Notebooks
[b]Requirements[/b]
No Azure Synapse Analytics experience needed. You will learning everything you needed
Basics of Python programming
Basics of SQL language
[b]Description[/b]
Are you ready to revolutionize your data analytics skills? Look no further. Welcome to our comprehensive course, where you'll delve deep into the world of Azure Synapse Analytics with PySpark and emerge equipped with the tools to excel in modern data analysis. Unlock the Power of Azure Synapse Analytics! 18.5+ HOURS OF IN-DEPTH LEARNING CONTENT! In this course we will be learning about :Serverless SQL Pool - Perform flexible querying for structured and initial data explorationSpark Pools - Dive into advanced data processing and analytics with the power of Apache Spark.Spark SQL - Seamlessly query structured data using Spark's SQL capabilities.MSSpark Utils - Leverage MSSpark Utilities for enhanced Spark functionalities for Synapse/50+ PySpark Transformations - Harness over 50 PySpark transformations to manipulate and refine your data.Dedicated SQL Pool - To report data efficiently to Power BI.Integrating Power BI with Azure Synapse Analytics - Seamlessly connect Power BI for enriched data visualization and insights.Delta Lake and its features - Integrate Delta Lake for reliable, ACID-compliant data.Spark Optimization Techniques - Employ optimization techniques to enhance Spark processing speed and efficiency.You will also learn how python is helpful in data analysis. Our project-based approach ensures hands-on learning, giving you the practical experience needed to conquer real-world data challenges.While this course not completely focuses on certification you can also learn the practical understanding about Azure Synapse analytics service that is needed to pass DP-203 - "Microsoft Certified Azure Data Engineer" and DP-500 "Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI"Join with me in mastering Azure Synapse Analytics !
Overview
Section 1: Introduction
Lecture 1 Introduction
Lecture 2 Project Architecture
Lecture 3 Course Slides
Section 2: Origin of Azure Synapse Analytics
Lecture 4 Section Introduction
Lecture 5 Need of separate Analytical system
Lecture 6 OLAP vs OLTP
Lecture 7 A typical Datawarehouse
Lecture 8 Datalake Introduction
Lecture 9 Modern datawarehouse and its problem
Lecture 10 The solution - Azure Synapse Analytics and its Components
Lecture 11 Azure Synapse Analytics - A Single stop solution
Lecture 12 Section Summary
Section 3: Environment Setup
Lecture 13 Section Introduction
Lecture 14 Creating a resource group in Azure
Lecture 15 Create Azure Synapse Analytics Service
Lecture 16 Exploring Azure Synapse Analytics
Lecture 17 Understanding the dataset
Section 4: Serverless SQL Pool
Lecture 18 Section Introduction
Lecture 19 Serverless SQL Pool - Introduction
Lecture 20 Serverless SQL Pool - Architecture
Lecture 21 Serverless SQL Pool- Benefits and Pricing
Lecture 22 Uploading files into Azure Datalake Storage
Lecture 23 Initial Data Exploration
Lecture 24 How to import SQL scripts or ipynb notebooks to Azure Synapse
Lecture 25 Fixing the Collation warning
Lecture 26 Creating External datasource
Lecture 27 Creating database scoped credential Using SAS
Lecture 28 Creating Database scoped cred using MI
Lecture 29 Deleting existing data sources for cleanup
Lecture 30 Creating an external file format - Demo
Lecture 31 Creating an External File Format - Practical
Lecture 32 Creating External DataSource for Refined container
Lecture 33 Creating an External Table
Lecture 34 End of section
Section 5: History and Data processing before Spark
Lecture 35 Section Introduction
Lecture 36 Big Data Approach
Lecture 37 Understanding Hadoop Yarn- Cluster Manager
Lecture 38 Understanding Hadoop - HDFS
Lecture 39 Understanding Hadoop - MapReduce Distributed Computing
Section 6: Emergence of Spark
Lecture 40 Section Introduction
Lecture 41 Drawbacks of MapReduce Framework
Lecture 42 Emergence of Spark
Section 7: Spark Core Concepts
Lecture 43 Section Introduction
Lecture 44 Spark EcoSystem
Lecture 45 Difference between Hadoop & Spark
Lecture 46 Spark Architecture
Lecture 47 Creating a Spark Pool & its benefits
Lecture 48 RDD Overview
Lecture 49 Functions Lambda, Map and Filter - Overview
Lecture 50 Understanding RDD in practical
Lecture 51 RDD- Lazy loading - Transformations and Actions
Lecture 52 What is RDD Lineage
Lecture 53 RDD - Word count program - Demo
Lecture 54 RDD - Word count - PySpark Program - Practical
Lecture 55 Optimization - ReduceByKey vs GroupByKey Explanation
Lecture 56 RDD - Understanding about Jobs in spark Practical
Lecture 57 RDD - Understanding Narrow and Wide Transformations
Lecture 58 RDD- Understanding Stages - Practical
Lecture 59 RDD- Understanding Tasks Practical
Lecture 60 Understand DAG , RDD Lineage and Differences
Lecture 61 Spark Higher level APIs Intro
Lecture 62 Synapse Notebook - Creating dataframes practical
Section 8: PySpark Transformation 1 - Select and Filter functions
Lecture 63 Introduction for PySpark Transformations
Lecture 64 Walkthrough on Notebook , Markdown cells
Lecture 65 Using Free Databricks Community Edition to practise and Save Costs
Lecture 66 Display and show Functions
Lecture 67 Stop Spark Session when not in use
Lecture 68 Select and SelectExpr
Lecture 69 Filter Function
Lecture 70 Organizing notebooks into a folder
Section 9: PySpark Transformation 2 - Handling Nulls, Duplicates and aggregation
Lecture 71 Understanding fillna and na.fill
Lecture 72 Identifying duplicates using Aggregations
Lecture 73 Handling Duplicates using dropna
Lecture 74 Organising notebooks into a folder
Lecture 75 Transformations summary of this section
Section 10: PySpark Transformation 3 - Data Transformation and Manipulation
Lecture 76 withColumn to Create Update columns
Lecture 77 Transforming and updating column withColumnRenamed
Section 11: PySpark 4 - Synapse Spark - MSSparkUtils
Lecture 78 What is MSSpark Utilities
Lecture 79 MSSpark Utils - Env utils
Lecture 80 What is mount point
Lecture 81 Creating and accessing mount point in Notebook
Lecture 82 All File System Utils
Lecture 83 Notebook Utils - Exit command
Lecture 84 Creating another spark pool
Lecture 85 Procedure to increase vCores request (optional)
Lecture 86 Calling notebook from another notebook
Lecture 87 Calling notebook from another using runtime parameters
Lecture 88 Magic commands
Lecture 89 Attaching two notebooks to a single spark pool
Lecture 90 Accessing Mount points from another notebook
Section 12: PySpark 5 - Synapse - Spark SQL
Lecture 91 Accessing data using Temporary Views - Practical
Lecture 92 Lake Database - Overview
Lecture 93 Understanding and creating database in Lake Database
Lecture 94 Using Spark SQL in notebook
Lecture 95 Managed vs External tables in Spark
Lecture 96 Metadata sharing between Spark pool and Serverless SQL Pool
Lecture 97 Deleting unwanted folders
Section 13: PySpark Transformation 6 - Join Transformations
Lecture 98 Uploading required files for Joins
Lecture 99 Python notebooks till Union
Lecture 100 Inner join
Lecture 101 Left Join
Lecture 102 Right Join
Lecture 103 Full outer join
Lecture 104 Left Semi Join
Lecture 105 Left anti and Cross Join
Lecture 106 Union Operation
Lecture 107 Performing Join Transformation on Project Dataset
Lecture 108 Summary of Transformations performed
Section 14: PySpark Transformation 7 - String Manipulation and sorting
Lecture 109 Replace function to change spaces
Lecture 110 PySpark Notebook for this section
Lecture 111 Split and concat functions
Lecture 112 Order by and sort
Lecture 113 Section Summary
Section 15: PySpark Transformation 8 - Window Functions
Lecture 114 Row number function
Lecture 115 PySpark Notebook used in this section
Lecture 116 Rank Function
Lecture 117 Dense Rank function
Section 16: PySpark Transformation 9 - Conversions and Pivoting
Lecture 118 Conversion using cast function
Lecture 119 PySpark Notebook need for casting and pivoting lectures
Lecture 120 Pivot function
Lecture 121 Unpivot using stack function
Lecture 122 Using to date to convert date column
Section 17: PySpark Transformation 10 - Schema definition and Management
Lecture 123 PySpark Notebook used in this lecture
Lecture 124 StructType and StructField - Demo
Lecture 125 Implementing explicit schema with StructType and StructField
Section 18: PySpark Transformation 11 - UDFs
Lecture 126 User Defined Functions - Demo
Lecture 127 Implementing UDFs in Notebook
Lecture 128 Writing transformed data to Processed container
Section 19: Dedicated SQL Pool
Lecture 129 Dedicated SQL pool - Demo
Lecture 130 Dedicated SQL Pool Architecture
Lecture 131 How distribution takes places based on DWU
Lecture 132 Factors to consider when choosing dedicated SQL pool
Lecture 133 Creating Dedicated SQL pool in Synapse
Lecture 134 Ways to copy data into Dedicated SQL Pool
Lecture 135 Copy command to copy to dedicated SQL pool
Lecture 136 Clustured Column Store index(optional)
Lecture 137 Types of Distributions or Sharing patterns
Lecture 138 Using Pipeline to Copy to dedicated SQL Pool
Section 20: Reporting data to Power BI
Lecture 139 Section Introduction
Lecture 140 Installing Power BI Desktop
Lecture 141 Creating report from Power BI Desktop
Lecture 142 Creating new user in Azure AD for creating workspace (if using personal account)
Lecture 143 Creating a shared workspace in Power BI
Lecture 144 Publishing report to Shared Workspace
Lecture 145 Accessing Power BI from Azure Synapse Analytics
Lecture 146 Download Power BI .pbix file from here
Lecture 147 Creating Dataset and report from Synapse Analytics
Lecture 148 Concluding the Power BI Section
Lecture 149 Summary and end of project implementation
Section 21: Spark - Optimisation Techniques
Lecture 150 Optimisation Section Intro
Lecture 151 Uploading required files for Optimisation
Lecture 152 Spark Optimisation levels
Lecture 153 Avoid using Collect function
Lecture 154 Making notebook into particular folder
Lecture 155 Avoid InferSchema
Lecture 156 Use Cache Persist 1 - Understanding Serialization and DeSerialization
Lecture 157 Use Cache Persist 2 - How cache or persist will work - Demo
Lecture 158 Use Cache Persist 3 - Understanding cache practically
Lecture 159 Use Cache Persist 4 - Persist - What is persist and different storage levels
Lecture 160 Use Cache Persist - Notebook for persist with all storage levels
Lecture 161 Use Cache Persist 5 - Persist - MEMORY_ONLY
Lecture 162 Use Cache Persist 6 - Persist - MEMORY AND DISK
Lecture 163 Use Cache Persist 7 - Persist - MEMORY_ONLY_SER (Scala Only)
Lecture 164 Use Cache Persist 8 - Persist - MEMORY_AND_DISK_SER ( Scala Only)
Lecture 165 Use Cache Persist 9 - Persist - DISK ONLY
Lecture 166 Use Cache Persist 10 - Persist - OFF HEAP (Scala Only)
Lecture 167 Use Cache Persist 11 - Persist - MEMORY_ONLY_2 (PySpark only)
Lecture 168 Use Partitioning 1 - Understanding partitioning - Demo
Lecture 169 Use Partitioning 2 - Understand partitioning - Practical
Lecture 170 Repartiton and coalesce 1 - Understanding repartition and coalesce - Demo
Lecture 171 Repartiton and coalesce 2 - Understanding repartition and coalesce - Practical
Lecture 172 Broadcast variables 1 - Understanding broadcast variables - Demo
Lecture 173 Broadcast variables 2 - Implementing broadcast variables in notebook
Lecture 174 Use Kryo Serializer
Section 22: Delta Lake
Lecture 175 Section Introduction
Lecture 176 Drawbacks of ADLS
Lecture 177 What is Delta lake
Lecture 178 Lakehouse Architecture
Lecture 179 Uploading required file for Delta lake
Lecture 180 Problems with Azure Datalake - Practical
Lecture 181 Creating a Delta lake
Lecture 182 Understanding Delta format
Lecture 183 Contents of Transaction Log or Delta log file - Practical
Lecture 184 Contents of a transaction log demo
Lecture 185 Creating delta table by Path using SQL
Lecture 186 Creating delta table in Metastore using Pyspark and SQL
Lecture 187 Schema Enforcement - Files required for Understanding Schema Enforcement -
Lecture 188 What is schema enforcement - Demo
Lecture 189 Schema Enforcement - Practical
Lecture 190 Schema Evolution - Practical
Lecture 191 16. Versioning and Time Travel
Lecture 192 Vacuum command
Lecture 193 Convert to Delta command
Lecture 194 Checkpoints in delta log
Lecture 195 Optimize command - Demo
Lecture 196 Optimize command - Practical
Lecture 197 Applying UPSERT using MERGE Command
Section 23: Conclusion
Lecture 198 Course Conclusion
Lecture 199 Bonus Lecture
Beginners who want to step into the world of Data Engineers,Professional Data Engineers who want to advance their data analysis skills,Students who are keen to learn Data Analytics,Data Engineers who want to learn data warehousing in Cloud using Azure Synapse Analytics
[b]What you'll learn[/b]
Understand Azure Synapse Analytics Services Practically
Complete basic to advanced understanding on Azure Synapse Analytics
Gain hands-on experience in applying Spark optimization techniques to real-world scenarios, achieving faster insights.
Understand 50+ most commonly used PySpark Transformations
Acquire a comprehensive library of 45+ PySpark notebooks for data cleansing, enrichment, and transformation.
Hands-on learning on building a modern data warehouse using Azure Synapse
Explore the capabilities of Spark Pools and their role in processing large-scale data workloads
Understand how python is used in Data Engineering
Understand and transform data with Serverless SQL pool
Understand the principles and advantages of Delta Lake as a reliable data storage and management solution.
Explore the capabilities of Spark Pools and their role in processing large-scale data workloads
Learn How Spark is evolved and its growth
Provides insights on services that needed to clear DP-203
Create and configure a Serverless SQL pool
Create External DataSource, External Files, External Tables in Serverless SQL pool
Configure Spark Pools and understand the working of them
Explore the capabilities of Spark Pools and their role in processing large-scale data workloads
Understand the Integration of Power BI with Azure Synapse Analytics
Explore the capabilities of Spark Pools and their role in processing large-scale data workloads
Create and work with Dedicated SQL pool on a high level
Optimize your PySpark with Spark Optimization techniques
Learn history and data processing before Spark
Implement the incremental UPSERT using Delta Lake
Understand and implement versioning in delta lake
Implement MSSpark Utils and the uses of its utilities
How we can mount Data lake to Synapse Notebooks
[b]Requirements[/b]
No Azure Synapse Analytics experience needed. You will learning everything you needed
Basics of Python programming
Basics of SQL language
[b]Description[/b]
Are you ready to revolutionize your data analytics skills? Look no further. Welcome to our comprehensive course, where you'll delve deep into the world of Azure Synapse Analytics with PySpark and emerge equipped with the tools to excel in modern data analysis. Unlock the Power of Azure Synapse Analytics! 18.5+ HOURS OF IN-DEPTH LEARNING CONTENT! In this course we will be learning about :Serverless SQL Pool - Perform flexible querying for structured and initial data explorationSpark Pools - Dive into advanced data processing and analytics with the power of Apache Spark.Spark SQL - Seamlessly query structured data using Spark's SQL capabilities.MSSpark Utils - Leverage MSSpark Utilities for enhanced Spark functionalities for Synapse/50+ PySpark Transformations - Harness over 50 PySpark transformations to manipulate and refine your data.Dedicated SQL Pool - To report data efficiently to Power BI.Integrating Power BI with Azure Synapse Analytics - Seamlessly connect Power BI for enriched data visualization and insights.Delta Lake and its features - Integrate Delta Lake for reliable, ACID-compliant data.Spark Optimization Techniques - Employ optimization techniques to enhance Spark processing speed and efficiency.You will also learn how python is helpful in data analysis. Our project-based approach ensures hands-on learning, giving you the practical experience needed to conquer real-world data challenges.While this course not completely focuses on certification you can also learn the practical understanding about Azure Synapse analytics service that is needed to pass DP-203 - "Microsoft Certified Azure Data Engineer" and DP-500 "Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI"Join with me in mastering Azure Synapse Analytics !
Overview
Section 1: Introduction
Lecture 1 Introduction
Lecture 2 Project Architecture
Lecture 3 Course Slides
Section 2: Origin of Azure Synapse Analytics
Lecture 4 Section Introduction
Lecture 5 Need of separate Analytical system
Lecture 6 OLAP vs OLTP
Lecture 7 A typical Datawarehouse
Lecture 8 Datalake Introduction
Lecture 9 Modern datawarehouse and its problem
Lecture 10 The solution - Azure Synapse Analytics and its Components
Lecture 11 Azure Synapse Analytics - A Single stop solution
Lecture 12 Section Summary
Section 3: Environment Setup
Lecture 13 Section Introduction
Lecture 14 Creating a resource group in Azure
Lecture 15 Create Azure Synapse Analytics Service
Lecture 16 Exploring Azure Synapse Analytics
Lecture 17 Understanding the dataset
Section 4: Serverless SQL Pool
Lecture 18 Section Introduction
Lecture 19 Serverless SQL Pool - Introduction
Lecture 20 Serverless SQL Pool - Architecture
Lecture 21 Serverless SQL Pool- Benefits and Pricing
Lecture 22 Uploading files into Azure Datalake Storage
Lecture 23 Initial Data Exploration
Lecture 24 How to import SQL scripts or ipynb notebooks to Azure Synapse
Lecture 25 Fixing the Collation warning
Lecture 26 Creating External datasource
Lecture 27 Creating database scoped credential Using SAS
Lecture 28 Creating Database scoped cred using MI
Lecture 29 Deleting existing data sources for cleanup
Lecture 30 Creating an external file format - Demo
Lecture 31 Creating an External File Format - Practical
Lecture 32 Creating External DataSource for Refined container
Lecture 33 Creating an External Table
Lecture 34 End of section
Section 5: History and Data processing before Spark
Lecture 35 Section Introduction
Lecture 36 Big Data Approach
Lecture 37 Understanding Hadoop Yarn- Cluster Manager
Lecture 38 Understanding Hadoop - HDFS
Lecture 39 Understanding Hadoop - MapReduce Distributed Computing
Section 6: Emergence of Spark
Lecture 40 Section Introduction
Lecture 41 Drawbacks of MapReduce Framework
Lecture 42 Emergence of Spark
Section 7: Spark Core Concepts
Lecture 43 Section Introduction
Lecture 44 Spark EcoSystem
Lecture 45 Difference between Hadoop & Spark
Lecture 46 Spark Architecture
Lecture 47 Creating a Spark Pool & its benefits
Lecture 48 RDD Overview
Lecture 49 Functions Lambda, Map and Filter - Overview
Lecture 50 Understanding RDD in practical
Lecture 51 RDD- Lazy loading - Transformations and Actions
Lecture 52 What is RDD Lineage
Lecture 53 RDD - Word count program - Demo
Lecture 54 RDD - Word count - PySpark Program - Practical
Lecture 55 Optimization - ReduceByKey vs GroupByKey Explanation
Lecture 56 RDD - Understanding about Jobs in spark Practical
Lecture 57 RDD - Understanding Narrow and Wide Transformations
Lecture 58 RDD- Understanding Stages - Practical
Lecture 59 RDD- Understanding Tasks Practical
Lecture 60 Understand DAG , RDD Lineage and Differences
Lecture 61 Spark Higher level APIs Intro
Lecture 62 Synapse Notebook - Creating dataframes practical
Section 8: PySpark Transformation 1 - Select and Filter functions
Lecture 63 Introduction for PySpark Transformations
Lecture 64 Walkthrough on Notebook , Markdown cells
Lecture 65 Using Free Databricks Community Edition to practise and Save Costs
Lecture 66 Display and show Functions
Lecture 67 Stop Spark Session when not in use
Lecture 68 Select and SelectExpr
Lecture 69 Filter Function
Lecture 70 Organizing notebooks into a folder
Section 9: PySpark Transformation 2 - Handling Nulls, Duplicates and aggregation
Lecture 71 Understanding fillna and na.fill
Lecture 72 Identifying duplicates using Aggregations
Lecture 73 Handling Duplicates using dropna
Lecture 74 Organising notebooks into a folder
Lecture 75 Transformations summary of this section
Section 10: PySpark Transformation 3 - Data Transformation and Manipulation
Lecture 76 withColumn to Create Update columns
Lecture 77 Transforming and updating column withColumnRenamed
Section 11: PySpark 4 - Synapse Spark - MSSparkUtils
Lecture 78 What is MSSpark Utilities
Lecture 79 MSSpark Utils - Env utils
Lecture 80 What is mount point
Lecture 81 Creating and accessing mount point in Notebook
Lecture 82 All File System Utils
Lecture 83 Notebook Utils - Exit command
Lecture 84 Creating another spark pool
Lecture 85 Procedure to increase vCores request (optional)
Lecture 86 Calling notebook from another notebook
Lecture 87 Calling notebook from another using runtime parameters
Lecture 88 Magic commands
Lecture 89 Attaching two notebooks to a single spark pool
Lecture 90 Accessing Mount points from another notebook
Section 12: PySpark 5 - Synapse - Spark SQL
Lecture 91 Accessing data using Temporary Views - Practical
Lecture 92 Lake Database - Overview
Lecture 93 Understanding and creating database in Lake Database
Lecture 94 Using Spark SQL in notebook
Lecture 95 Managed vs External tables in Spark
Lecture 96 Metadata sharing between Spark pool and Serverless SQL Pool
Lecture 97 Deleting unwanted folders
Section 13: PySpark Transformation 6 - Join Transformations
Lecture 98 Uploading required files for Joins
Lecture 99 Python notebooks till Union
Lecture 100 Inner join
Lecture 101 Left Join
Lecture 102 Right Join
Lecture 103 Full outer join
Lecture 104 Left Semi Join
Lecture 105 Left anti and Cross Join
Lecture 106 Union Operation
Lecture 107 Performing Join Transformation on Project Dataset
Lecture 108 Summary of Transformations performed
Section 14: PySpark Transformation 7 - String Manipulation and sorting
Lecture 109 Replace function to change spaces
Lecture 110 PySpark Notebook for this section
Lecture 111 Split and concat functions
Lecture 112 Order by and sort
Lecture 113 Section Summary
Section 15: PySpark Transformation 8 - Window Functions
Lecture 114 Row number function
Lecture 115 PySpark Notebook used in this section
Lecture 116 Rank Function
Lecture 117 Dense Rank function
Section 16: PySpark Transformation 9 - Conversions and Pivoting
Lecture 118 Conversion using cast function
Lecture 119 PySpark Notebook need for casting and pivoting lectures
Lecture 120 Pivot function
Lecture 121 Unpivot using stack function
Lecture 122 Using to date to convert date column
Section 17: PySpark Transformation 10 - Schema definition and Management
Lecture 123 PySpark Notebook used in this lecture
Lecture 124 StructType and StructField - Demo
Lecture 125 Implementing explicit schema with StructType and StructField
Section 18: PySpark Transformation 11 - UDFs
Lecture 126 User Defined Functions - Demo
Lecture 127 Implementing UDFs in Notebook
Lecture 128 Writing transformed data to Processed container
Section 19: Dedicated SQL Pool
Lecture 129 Dedicated SQL pool - Demo
Lecture 130 Dedicated SQL Pool Architecture
Lecture 131 How distribution takes places based on DWU
Lecture 132 Factors to consider when choosing dedicated SQL pool
Lecture 133 Creating Dedicated SQL pool in Synapse
Lecture 134 Ways to copy data into Dedicated SQL Pool
Lecture 135 Copy command to copy to dedicated SQL pool
Lecture 136 Clustured Column Store index(optional)
Lecture 137 Types of Distributions or Sharing patterns
Lecture 138 Using Pipeline to Copy to dedicated SQL Pool
Section 20: Reporting data to Power BI
Lecture 139 Section Introduction
Lecture 140 Installing Power BI Desktop
Lecture 141 Creating report from Power BI Desktop
Lecture 142 Creating new user in Azure AD for creating workspace (if using personal account)
Lecture 143 Creating a shared workspace in Power BI
Lecture 144 Publishing report to Shared Workspace
Lecture 145 Accessing Power BI from Azure Synapse Analytics
Lecture 146 Download Power BI .pbix file from here
Lecture 147 Creating Dataset and report from Synapse Analytics
Lecture 148 Concluding the Power BI Section
Lecture 149 Summary and end of project implementation
Section 21: Spark - Optimisation Techniques
Lecture 150 Optimisation Section Intro
Lecture 151 Uploading required files for Optimisation
Lecture 152 Spark Optimisation levels
Lecture 153 Avoid using Collect function
Lecture 154 Making notebook into particular folder
Lecture 155 Avoid InferSchema
Lecture 156 Use Cache Persist 1 - Understanding Serialization and DeSerialization
Lecture 157 Use Cache Persist 2 - How cache or persist will work - Demo
Lecture 158 Use Cache Persist 3 - Understanding cache practically
Lecture 159 Use Cache Persist 4 - Persist - What is persist and different storage levels
Lecture 160 Use Cache Persist - Notebook for persist with all storage levels
Lecture 161 Use Cache Persist 5 - Persist - MEMORY_ONLY
Lecture 162 Use Cache Persist 6 - Persist - MEMORY AND DISK
Lecture 163 Use Cache Persist 7 - Persist - MEMORY_ONLY_SER (Scala Only)
Lecture 164 Use Cache Persist 8 - Persist - MEMORY_AND_DISK_SER ( Scala Only)
Lecture 165 Use Cache Persist 9 - Persist - DISK ONLY
Lecture 166 Use Cache Persist 10 - Persist - OFF HEAP (Scala Only)
Lecture 167 Use Cache Persist 11 - Persist - MEMORY_ONLY_2 (PySpark only)
Lecture 168 Use Partitioning 1 - Understanding partitioning - Demo
Lecture 169 Use Partitioning 2 - Understand partitioning - Practical
Lecture 170 Repartiton and coalesce 1 - Understanding repartition and coalesce - Demo
Lecture 171 Repartiton and coalesce 2 - Understanding repartition and coalesce - Practical
Lecture 172 Broadcast variables 1 - Understanding broadcast variables - Demo
Lecture 173 Broadcast variables 2 - Implementing broadcast variables in notebook
Lecture 174 Use Kryo Serializer
Section 22: Delta Lake
Lecture 175 Section Introduction
Lecture 176 Drawbacks of ADLS
Lecture 177 What is Delta lake
Lecture 178 Lakehouse Architecture
Lecture 179 Uploading required file for Delta lake
Lecture 180 Problems with Azure Datalake - Practical
Lecture 181 Creating a Delta lake
Lecture 182 Understanding Delta format
Lecture 183 Contents of Transaction Log or Delta log file - Practical
Lecture 184 Contents of a transaction log demo
Lecture 185 Creating delta table by Path using SQL
Lecture 186 Creating delta table in Metastore using Pyspark and SQL
Lecture 187 Schema Enforcement - Files required for Understanding Schema Enforcement -
Lecture 188 What is schema enforcement - Demo
Lecture 189 Schema Enforcement - Practical
Lecture 190 Schema Evolution - Practical
Lecture 191 16. Versioning and Time Travel
Lecture 192 Vacuum command
Lecture 193 Convert to Delta command
Lecture 194 Checkpoints in delta log
Lecture 195 Optimize command - Demo
Lecture 196 Optimize command - Practical
Lecture 197 Applying UPSERT using MERGE Command
Section 23: Conclusion
Lecture 198 Course Conclusion
Lecture 199 Bonus Lecture
Beginners who want to step into the world of Data Engineers,Professional Data Engineers who want to advance their data analysis skills,Students who are keen to learn Data Analytics,Data Engineers who want to learn data warehousing in Cloud using Azure Synapse Analytics