06-07-2023, 09:26 AM
Data Ingestion with Python Cookbook | 616 | Gláucia Esppenchutz | 2023 | Packt Publishing Pvt Ltd | B0C3CQDYHW
Deploy your own data ingestion pipeline, orchestrate, and monitor efficiently to prevent loss of data and quality.
Key Features
- ✔ Implement the best practices to create a data Ingestion pipeline using python and pySpark
✔ Automate and orchestrate your data pipelines using Apache Airflow
✔ Build a monitoring framework while applying the concept of data observability to your pipelines
Book Description
Data Ingestion with Python Cookbook brings a practical way to design and apply data ingestion pipelines, providing real-world examples with the most reputed open-source tools available on the market, and bringing enlightenment to questions or obstacles.
You will be introduced to designing and working with or without data schemas, and creating monitored pipelines with airflow and data observability principles, while following the best practices. The book will further address the challenges to read different data sources or data formats. You will then progress to gain a broad understanding of the best practices for logging errors, how to identify/solve them, data orchestration, monitoring, and where to store the logs for further consultation.
By the end of the book, You will have a complete automated set to start ingesting and monitoring the pipeline, making it easier to plug into the further steps of the ETL process later.
What you will learn
- ✔ Apply data observability using monitoring tools
✔ Automate your data ingestion pipeline
✔ Read analytical and partitioned data with schema or non-schema
✔ Debug and prevent data loss using efficient monitoring and logging
✔ Apply data access policies using a data governance framework
✔ Create a data orchestration framework to improve data quality
Who This Book Is For
This book is for Data Engineers and data enthusiasts who want to have a better understanding of the process of ingesting data using the most popular tools in the open-source community.
For more advanced learners, this book takes on the theoretical pillars of Data Governance and gives practical examples of real scenarios frequently seen on a daily basis by data engineers.
Table of Contents
- ✔ GRAPHIC BUNDLE present Introduction to Data Ingestion
✔ Data Access Principals - Accessing your data
✔ Data Discovery - Understanding our data before ingesting
✔ Reading CSV and JSON files and solving problems
✔ Ingesting Data from Structured and Unstructured DataBases
✔ Using PySpark with defined and non-defined schemas
✔ Ingesting Analytical Data
✔ Designing Monitored Data Workflows
✔ Putting everything together with Airflow
✔ Logging and Monitoring your Data Ingest in Airflow
✔ Automate your Data Pipelines
✔ Using Data Observability to Debug, Error Handling, and Prevent Downtimes
Files:
Data Ingestion with Python Cookbook A practical guide to ingesting, monitoring.epub (31.04 MB)
NitroFlare Link(s)
RapidGator Link(s)