Evaluating Large Language Models (LLMs)

Evaluating Large Language Models (LLMs) - Printable Version

+- Softwarez.Info - Software's World! (https://softwarez.info)
+-- Forum: Library Zone (https://softwarez.info/Forum-Library-Zone)
+--- Forum: Video Tutorials (https://softwarez.info/Forum-Video-Tutorials)
+--- Thread: Evaluating Large Language Models (LLMs) (/Thread-Evaluating-Large-Language-Models-LLMs)

Evaluating Large Language Models (LLMs) - BaDshaH - 02-05-2025

[Image: fc7a978acf74cc20d0815e43c8bed7f2.webp]

Released 2/2025
By Sinan Ozdemir
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz, 2 Ch
Genre: eLearning | Language: English | Duration: 7h 56m | Size: 2.14 GB

Table of contents
Introduction
Evaluating Large Language Models (LLMs): Introduction
Lesson 1: Foundations of LLM Evaluation
Learning objectives
1.1 Introduction to Evaluation: Why It Matters
1.2 Generative versus Understanding Tasks
1.3 Key Metrics for Common Tasks
Lesson 2: Evaluating Generative Tasks
Learning objectives
2.1 Evaluating Multiple-Choice Tasks
2.2 Evaluating Free Text Response Tasks
2.3 AIs Supervising AIs: LLM as a Judge
Lesson 3: Evaluating Understanding Tasks
Learning objectives
3.1 Evaluating Embedding Tasks
3.2 Evaluating Classification Tasks
3.3 Building an LLM Classifier with BERT and GPT
Lesson 4: Using Benchmarks Effectively
Learning objectives
4.1 The Role of Benchmarks
4.2 Interrogating Common Benchmarks
4.3 Evaluating LLMs with Benchmarks
Lesson 5: Probing LLMs for a World Model
Learning objectives
5.1 Probing LLMs for Knowledge
5.2 Probing LLMs to Play Games
Lesson 6: Evaluating LLM Fine-Tuning
Learning objectives
6.1 Fine-Tuning Objectives
6.2 Metrics for Fine-Tuning Success
6.3 Practical Demonstration: Evaluating Fine-Tuning
6.4 Evaluating and Cleaning Data
Lesson 7: Case Studies
Learning objectives
7.1 Evaluating AI Agents: Task Automation and Tool Integration
7.2 Measuring Retrieval-Augmented Generation (RAG) Systems
7.3 Building and Evaluating a Recommendation Engine Using LLMs
7.4 Using Evaluation to Combat AI Drift
7.5 Time-Series Regression
Lesson 8: Summary of Evaluation and Looking Ahead
Learning objectives
8.1 When and How to Evaluate
8.2 Looking Ahead: Trends in LLM Evaluation
Summary
Evaluating Large Language Models (LLMs): Summary

[To see links please register or login]