Basics To Advanced: Azure Synapse Analytics Hands-On Project - Printable Version +- Softwarez.Info - Software's World! (https://softwarez.info) +-- Forum: Library Zone (https://softwarez.info/Forum-Library-Zone) +--- Forum: Video Tutorials (https://softwarez.info/Forum-Video-Tutorials) +--- Thread: Basics To Advanced: Azure Synapse Analytics Hands-On Project (/Thread-Basics-To-Advanced-Azure-Synapse-Analytics-Hands-On-Project--460772) |
Basics To Advanced: Azure Synapse Analytics Hands-On Project - AD-TEAM - 06-30-2024 Basics To Advanced: Azure Synapse Analytics Hands-On Project Published 8/2023 MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz Language: English | Size: 6.86 GB | Duration: 18h 40m Build complete project only with Azure Synapse Analytics focused on PySpark includes delta lake and spark Optimizations
[b]What you'll learn[/b] Understand Azure Synapse Analytics Services Practically Complete basic to advanced understanding on Azure Synapse Analytics Gain hands-on experience in applying Spark optimization techniques to real-world scenarios, achieving faster insights. Understand 50+ most commonly used PySpark Transformations Acquire a comprehensive library of 45+ PySpark notebooks for data cleansing, enrichment, and transformation. Hands-on learning on building a modern data warehouse using Azure Synapse Explore the capabilities of Spark Pools and their role in processing large-scale data workloads Understand how python is used in Data Engineering Understand and transform data with Serverless SQL pool Understand the principles and advantages of Delta Lake as a reliable data storage and management solution. Explore the capabilities of Spark Pools and their role in processing large-scale data workloads Learn How Spark is evolved and its growth Provides insights on services that needed to clear DP-203 Create and configure a Serverless SQL pool Create External DataSource, External Files, External Tables in Serverless SQL pool Configure Spark Pools and understand the working of them Explore the capabilities of Spark Pools and their role in processing large-scale data workloads Understand the Integration of Power BI with Azure Synapse Analytics Explore the capabilities of Spark Pools and their role in processing large-scale data workloads Create and work with Dedicated SQL pool on a high level Optimize your PySpark with Spark Optimization techniques Learn history and data processing before Spark Implement the incremental UPSERT using Delta Lake Understand and implement versioning in delta lake Implement MSSpark Utils and the uses of its utilities How we can mount Data lake to Synapse Notebooks [b]Requirements[/b] No Azure Synapse Analytics experience needed. You will learning everything you needed Basics of Python programming Basics of SQL language [b]Description[/b] Are you ready to revolutionize your data analytics skills? Look no further. Welcome to our comprehensive course, where you'll delve deep into the world of Azure Synapse Analytics with PySpark and emerge equipped with the tools to excel in modern data analysis. Unlock the Power of Azure Synapse Analytics! 18.5+ HOURS OF IN-DEPTH LEARNING CONTENT! In this course we will be learning about :Serverless SQL Pool - Perform flexible querying for structured and initial data explorationSpark Pools - Dive into advanced data processing and analytics with the power of Apache Spark.Spark SQL - Seamlessly query structured data using Spark's SQL capabilities.MSSpark Utils - Leverage MSSpark Utilities for enhanced Spark functionalities for Synapse/50+ PySpark Transformations - Harness over 50 PySpark transformations to manipulate and refine your data.Dedicated SQL Pool - To report data efficiently to Power BI.Integrating Power BI with Azure Synapse Analytics - Seamlessly connect Power BI for enriched data visualization and insights.Delta Lake and its features - Integrate Delta Lake for reliable, ACID-compliant data.Spark Optimization Techniques - Employ optimization techniques to enhance Spark processing speed and efficiency.You will also learn how python is helpful in data analysis. Our project-based approach ensures hands-on learning, giving you the practical experience needed to conquer real-world data challenges.While this course not completely focuses on certification you can also learn the practical understanding about Azure Synapse analytics service that is needed to pass DP-203 - "Microsoft Certified Azure Data Engineer" and DP-500 "Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI"Join with me in mastering Azure Synapse Analytics ! Overview Section 1: Introduction Lecture 1 Introduction Lecture 2 Project Architecture Lecture 3 Course Slides Section 2: Origin of Azure Synapse Analytics Lecture 4 Section Introduction Lecture 5 Need of separate Analytical system Lecture 6 OLAP vs OLTP Lecture 7 A typical Datawarehouse Lecture 8 Datalake Introduction Lecture 9 Modern datawarehouse and its problem Lecture 10 The solution - Azure Synapse Analytics and its Components Lecture 11 Azure Synapse Analytics - A Single stop solution Lecture 12 Section Summary Section 3: Environment Setup Lecture 13 Section Introduction Lecture 14 Creating a resource group in Azure Lecture 15 Create Azure Synapse Analytics Service Lecture 16 Exploring Azure Synapse Analytics Lecture 17 Understanding the dataset Section 4: Serverless SQL Pool Lecture 18 Section Introduction Lecture 19 Serverless SQL Pool - Introduction Lecture 20 Serverless SQL Pool - Architecture Lecture 21 Serverless SQL Pool- Benefits and Pricing Lecture 22 Uploading files into Azure Datalake Storage Lecture 23 Initial Data Exploration Lecture 24 How to import SQL scripts or ipynb notebooks to Azure Synapse Lecture 25 Fixing the Collation warning Lecture 26 Creating External datasource Lecture 27 Creating database scoped credential Using SAS Lecture 28 Creating Database scoped cred using MI Lecture 29 Deleting existing data sources for cleanup Lecture 30 Creating an external file format - Demo Lecture 31 Creating an External File Format - Practical Lecture 32 Creating External DataSource for Refined container Lecture 33 Creating an External Table Lecture 34 End of section Section 5: History and Data processing before Spark Lecture 35 Section Introduction Lecture 36 Big Data Approach Lecture 37 Understanding Hadoop Yarn- Cluster Manager Lecture 38 Understanding Hadoop - HDFS Lecture 39 Understanding Hadoop - MapReduce Distributed Computing Section 6: Emergence of Spark Lecture 40 Section Introduction Lecture 41 Drawbacks of MapReduce Framework Lecture 42 Emergence of Spark Section 7: Spark Core Concepts Lecture 43 Section Introduction Lecture 44 Spark EcoSystem Lecture 45 Difference between Hadoop & Spark Lecture 46 Spark Architecture Lecture 47 Creating a Spark Pool & its benefits Lecture 48 RDD Overview Lecture 49 Functions Lambda, Map and Filter - Overview Lecture 50 Understanding RDD in practical Lecture 51 RDD- Lazy loading - Transformations and Actions Lecture 52 What is RDD Lineage Lecture 53 RDD - Word count program - Demo Lecture 54 RDD - Word count - PySpark Program - Practical Lecture 55 Optimization - ReduceByKey vs GroupByKey Explanation Lecture 56 RDD - Understanding about Jobs in spark Practical Lecture 57 RDD - Understanding Narrow and Wide Transformations Lecture 58 RDD- Understanding Stages - Practical Lecture 59 RDD- Understanding Tasks Practical Lecture 60 Understand DAG , RDD Lineage and Differences Lecture 61 Spark Higher level APIs Intro Lecture 62 Synapse Notebook - Creating dataframes practical Section 8: PySpark Transformation 1 - Select and Filter functions Lecture 63 Introduction for PySpark Transformations Lecture 64 Walkthrough on Notebook , Markdown cells Lecture 65 Using Free Databricks Community Edition to practise and Save Costs Lecture 66 Display and show Functions Lecture 67 Stop Spark Session when not in use Lecture 68 Select and SelectExpr Lecture 69 Filter Function Lecture 70 Organizing notebooks into a folder Section 9: PySpark Transformation 2 - Handling Nulls, Duplicates and aggregation Lecture 71 Understanding fillna and na.fill Lecture 72 Identifying duplicates using Aggregations Lecture 73 Handling Duplicates using dropna Lecture 74 Organising notebooks into a folder Lecture 75 Transformations summary of this section Section 10: PySpark Transformation 3 - Data Transformation and Manipulation Lecture 76 withColumn to Create Update columns Lecture 77 Transforming and updating column withColumnRenamed Section 11: PySpark 4 - Synapse Spark - MSSparkUtils Lecture 78 What is MSSpark Utilities Lecture 79 MSSpark Utils - Env utils Lecture 80 What is mount point Lecture 81 Creating and accessing mount point in Notebook Lecture 82 All File System Utils Lecture 83 Notebook Utils - Exit command Lecture 84 Creating another spark pool Lecture 85 Procedure to increase vCores request (optional) Lecture 86 Calling notebook from another notebook Lecture 87 Calling notebook from another using runtime parameters Lecture 88 Magic commands Lecture 89 Attaching two notebooks to a single spark pool Lecture 90 Accessing Mount points from another notebook Section 12: PySpark 5 - Synapse - Spark SQL Lecture 91 Accessing data using Temporary Views - Practical Lecture 92 Lake Database - Overview Lecture 93 Understanding and creating database in Lake Database Lecture 94 Using Spark SQL in notebook Lecture 95 Managed vs External tables in Spark Lecture 96 Metadata sharing between Spark pool and Serverless SQL Pool Lecture 97 Deleting unwanted folders Section 13: PySpark Transformation 6 - Join Transformations Lecture 98 Uploading required files for Joins Lecture 99 Python notebooks till Union Lecture 100 Inner join Lecture 101 Left Join Lecture 102 Right Join Lecture 103 Full outer join Lecture 104 Left Semi Join Lecture 105 Left anti and Cross Join Lecture 106 Union Operation Lecture 107 Performing Join Transformation on Project Dataset Lecture 108 Summary of Transformations performed Section 14: PySpark Transformation 7 - String Manipulation and sorting Lecture 109 Replace function to change spaces Lecture 110 PySpark Notebook for this section Lecture 111 Split and concat functions Lecture 112 Order by and sort Lecture 113 Section Summary Section 15: PySpark Transformation 8 - Window Functions Lecture 114 Row number function Lecture 115 PySpark Notebook used in this section Lecture 116 Rank Function Lecture 117 Dense Rank function Section 16: PySpark Transformation 9 - Conversions and Pivoting Lecture 118 Conversion using cast function Lecture 119 PySpark Notebook need for casting and pivoting lectures Lecture 120 Pivot function Lecture 121 Unpivot using stack function Lecture 122 Using to date to convert date column Section 17: PySpark Transformation 10 - Schema definition and Management Lecture 123 PySpark Notebook used in this lecture Lecture 124 StructType and StructField - Demo Lecture 125 Implementing explicit schema with StructType and StructField Section 18: PySpark Transformation 11 - UDFs Lecture 126 User Defined Functions - Demo Lecture 127 Implementing UDFs in Notebook Lecture 128 Writing transformed data to Processed container Section 19: Dedicated SQL Pool Lecture 129 Dedicated SQL pool - Demo Lecture 130 Dedicated SQL Pool Architecture Lecture 131 How distribution takes places based on DWU Lecture 132 Factors to consider when choosing dedicated SQL pool Lecture 133 Creating Dedicated SQL pool in Synapse Lecture 134 Ways to copy data into Dedicated SQL Pool Lecture 135 Copy command to copy to dedicated SQL pool Lecture 136 Clustured Column Store index(optional) Lecture 137 Types of Distributions or Sharing patterns Lecture 138 Using Pipeline to Copy to dedicated SQL Pool Section 20: Reporting data to Power BI Lecture 139 Section Introduction Lecture 140 Installing Power BI Desktop Lecture 141 Creating report from Power BI Desktop Lecture 142 Creating new user in Azure AD for creating workspace (if using personal account) Lecture 143 Creating a shared workspace in Power BI Lecture 144 Publishing report to Shared Workspace Lecture 145 Accessing Power BI from Azure Synapse Analytics Lecture 146 Download Power BI .pbix file from here Lecture 147 Creating Dataset and report from Synapse Analytics Lecture 148 Concluding the Power BI Section Lecture 149 Summary and end of project implementation Section 21: Spark - Optimisation Techniques Lecture 150 Optimisation Section Intro Lecture 151 Uploading required files for Optimisation Lecture 152 Spark Optimisation levels Lecture 153 Avoid using Collect function Lecture 154 Making notebook into particular folder Lecture 155 Avoid InferSchema Lecture 156 Use Cache Persist 1 - Understanding Serialization and DeSerialization Lecture 157 Use Cache Persist 2 - How cache or persist will work - Demo Lecture 158 Use Cache Persist 3 - Understanding cache practically Lecture 159 Use Cache Persist 4 - Persist - What is persist and different storage levels Lecture 160 Use Cache Persist - Notebook for persist with all storage levels Lecture 161 Use Cache Persist 5 - Persist - MEMORY_ONLY Lecture 162 Use Cache Persist 6 - Persist - MEMORY AND DISK Lecture 163 Use Cache Persist 7 - Persist - MEMORY_ONLY_SER (Scala Only) Lecture 164 Use Cache Persist 8 - Persist - MEMORY_AND_DISK_SER ( Scala Only) Lecture 165 Use Cache Persist 9 - Persist - DISK ONLY Lecture 166 Use Cache Persist 10 - Persist - OFF HEAP (Scala Only) Lecture 167 Use Cache Persist 11 - Persist - MEMORY_ONLY_2 (PySpark only) Lecture 168 Use Partitioning 1 - Understanding partitioning - Demo Lecture 169 Use Partitioning 2 - Understand partitioning - Practical Lecture 170 Repartiton and coalesce 1 - Understanding repartition and coalesce - Demo Lecture 171 Repartiton and coalesce 2 - Understanding repartition and coalesce - Practical Lecture 172 Broadcast variables 1 - Understanding broadcast variables - Demo Lecture 173 Broadcast variables 2 - Implementing broadcast variables in notebook Lecture 174 Use Kryo Serializer Section 22: Delta Lake Lecture 175 Section Introduction Lecture 176 Drawbacks of ADLS Lecture 177 What is Delta lake Lecture 178 Lakehouse Architecture Lecture 179 Uploading required file for Delta lake Lecture 180 Problems with Azure Datalake - Practical Lecture 181 Creating a Delta lake Lecture 182 Understanding Delta format Lecture 183 Contents of Transaction Log or Delta log file - Practical Lecture 184 Contents of a transaction log demo Lecture 185 Creating delta table by Path using SQL Lecture 186 Creating delta table in Metastore using Pyspark and SQL Lecture 187 Schema Enforcement - Files required for Understanding Schema Enforcement - Lecture 188 What is schema enforcement - Demo Lecture 189 Schema Enforcement - Practical Lecture 190 Schema Evolution - Practical Lecture 191 16. Versioning and Time Travel Lecture 192 Vacuum command Lecture 193 Convert to Delta command Lecture 194 Checkpoints in delta log Lecture 195 Optimize command - Demo Lecture 196 Optimize command - Practical Lecture 197 Applying UPSERT using MERGE Command Section 23: Conclusion Lecture 198 Course Conclusion Lecture 199 Bonus Lecture Beginners who want to step into the world of Data Engineers,Professional Data Engineers who want to advance their data analysis skills,Students who are keen to learn Data Analytics,Data Engineers who want to learn data warehousing in Cloud using Azure Synapse Analytics |