Softwarez.Info - Software's World!
Pyspark For Data Scientists - Printable Version

+- Softwarez.Info - Software's World! (https://softwarez.info)
+-- Forum: Library Zone (https://softwarez.info/Forum-Library-Zone)
+--- Forum: Video Tutorials (https://softwarez.info/Forum-Video-Tutorials)
+--- Thread: Pyspark For Data Scientists (/Thread-Pyspark-For-Data-Scientists)



Pyspark For Data Scientists - AD-TEAM - 10-26-2024

[Image: 2f90c3ea50002d5a3c5ee7d30d60eff7.jpg]
Pyspark For Data Scientists
Published 10/2024
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 2.23 GB | Duration: 4h 43m

PySpark for Data Scientists

[b]What you'll learn[/b]

Foundations of PySpark: Gain a solid understanding of fundamental PySpark concepts and principles.

Data Manipulation Techniques: Explore key data manipulation techniques such as dataframes, RDDs, and SQL queries in PySpark.

Distributed Data Processing: Learn techniques for distributed data processing and optimisation.

Data Preparation: Understand and implement strategies for data cleaning and transformation.

[b]Requirements[/b]

Basic Understanding of Python Programming: This includes familiarity with libraries such as NumPy and Pandas.

Knowledge of Data Science Fundamentals: Understanding of data manipulation, exploratory data analysis, and basic machine learning concepts.

Familiarity with Big Data Concepts: Basic knowledge of big data concepts and distributed computing is beneficial but not required.

[b]Description[/b]

Welcome to the "PySpark for Data Scientists" course! This comprehensive program is designed to equip you with essential knowledge and skills to harness PySpark for big data analytics. Whether you are new to data science or looking to enhance your expertise, this course covers everything required to build, optimize, and analyze large-scale datasets effectively.Throughout the course, you will explore a wide range of PySpark concepts and practical applications, focusing on distributed data processing and large-scale data analysis. You'll begin with the fundamental principles of PySpark and its ecosystem, covering crucial topics such as data manipulation techniques, including DataFrames and RDDs, as well as SQL queries for data transformation. Practical applications of distributed computing will help optimize your data processing workflows. In addition to foundational concepts, the course delves into advanced topics, including data preparation strategies for cleaning and transforming datasets and utilizing PySpark's capabilities for real-time data processing.By the end of this course, you will be proficient in implementing PySpark techniques to tackle complex data challenges. You will be able to extract meaningful insights from large datasets and apply your skills to real-world scenarios across various data-driven fields. Get ready to unlock limitless opportunities in big data analytics!

Overview

Section 1: Introduction to Big Data

Lecture 1 BIG DATA HISTORY PART 1

Lecture 2 BIG DATA HISTORY PART 2

Section 2: Introduction tp RDD and Spark

Lecture 3 RDD Introduction

Lecture 4 Spark Ecosystem

Lecture 5 Spark Lazy Evulation

Lecture 6 Spark RDD Setup On Google Colab

Lecture 7 Spark context & Spark Session

Lecture 8 Spark RDD Transformation - Part 1

Lecture 9 Spark RDD Transformation - Part 2

Lecture 10 Spark RDD Transformation - Part 3

Lecture 11 RDD Action

Section 3: Data Frame & Sparke shell

Lecture 12 DataFrame - Part 1

Lecture 13 DataFrame - Part 2

Lecture 14 Spark-shell, spark-submit & running spark in local

Section 4: Quiz

Aspiring Data Scientists,Data Engineers and Analysts,Business Analysts,Students looking to enter the field of big data,Professionals seeking to enhance their data processing skills

[Image: Xixhotbm_o.jpg]

[To see links please register or login]

[To see links please register or login]

[To see links please register or login]