![]() |
Evaluating Large Language Models (LLMs) - Printable Version +- Softwarez.Info - Software's World! (https://softwarez.info) +-- Forum: Library Zone (https://softwarez.info/Forum-Library-Zone) +--- Forum: Video Tutorials (https://softwarez.info/Forum-Video-Tutorials) +--- Thread: Evaluating Large Language Models (LLMs) (/Thread-Evaluating-Large-Language-Models-LLMs) |
Evaluating Large Language Models (LLMs) - BaDshaH - 02-05-2025 ![]() Released 2/2025 By Sinan Ozdemir MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz, 2 Ch Genre: eLearning | Language: English | Duration: 7h 56m | Size: 2.14 GB Table of contents Introduction Evaluating Large Language Models (LLMs): Introduction Lesson 1: Foundations of LLM Evaluation Learning objectives 1.1 Introduction to Evaluation: Why It Matters 1.2 Generative versus Understanding Tasks 1.3 Key Metrics for Common Tasks Lesson 2: Evaluating Generative Tasks Learning objectives 2.1 Evaluating Multiple-Choice Tasks 2.2 Evaluating Free Text Response Tasks 2.3 AIs Supervising AIs: LLM as a Judge Lesson 3: Evaluating Understanding Tasks Learning objectives 3.1 Evaluating Embedding Tasks 3.2 Evaluating Classification Tasks 3.3 Building an LLM Classifier with BERT and GPT Lesson 4: Using Benchmarks Effectively Learning objectives 4.1 The Role of Benchmarks 4.2 Interrogating Common Benchmarks 4.3 Evaluating LLMs with Benchmarks Lesson 5: Probing LLMs for a World Model Learning objectives 5.1 Probing LLMs for Knowledge 5.2 Probing LLMs to Play Games Lesson 6: Evaluating LLM Fine-Tuning Learning objectives 6.1 Fine-Tuning Objectives 6.2 Metrics for Fine-Tuning Success 6.3 Practical Demonstration: Evaluating Fine-Tuning 6.4 Evaluating and Cleaning Data Lesson 7: Case Studies Learning objectives 7.1 Evaluating AI Agents: Task Automation and Tool Integration 7.2 Measuring Retrieval-Augmented Generation (RAG) Systems 7.3 Building and Evaluating a Recommendation Engine Using LLMs 7.4 Using Evaluation to Combat AI Drift 7.5 Time-Series Regression Lesson 8: Summary of Evaluation and Looking Ahead Learning objectives 8.1 When and How to Evaluate 8.2 Looking Ahead: Trends in LLM Evaluation Summary Evaluating Large Language Models (LLMs): Summary |