Register Account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Evaluating Large Language Models (LLMs)
#1
[Image: fc7a978acf74cc20d0815e43c8bed7f2.webp]

Released 2/2025
By Sinan Ozdemir
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz, 2 Ch
Genre: eLearning | Language: English | Duration: 7h 56m | Size: 2.14 GB

Table of contents
Introduction
Evaluating Large Language Models (LLMs): Introduction
Lesson 1: Foundations of LLM Evaluation
Learning objectives
1.1 Introduction to Evaluation: Why It Matters
1.2 Generative versus Understanding Tasks
1.3 Key Metrics for Common Tasks
Lesson 2: Evaluating Generative Tasks
Learning objectives
2.1 Evaluating Multiple-Choice Tasks
2.2 Evaluating Free Text Response Tasks
2.3 AIs Supervising AIs: LLM as a Judge
Lesson 3: Evaluating Understanding Tasks
Learning objectives
3.1 Evaluating Embedding Tasks
3.2 Evaluating Classification Tasks
3.3 Building an LLM Classifier with BERT and GPT
Lesson 4: Using Benchmarks Effectively
Learning objectives
4.1 The Role of Benchmarks
4.2 Interrogating Common Benchmarks
4.3 Evaluating LLMs with Benchmarks
Lesson 5: Probing LLMs for a World Model
Learning objectives
5.1 Probing LLMs for Knowledge
5.2 Probing LLMs to Play Games
Lesson 6: Evaluating LLM Fine-Tuning
Learning objectives
6.1 Fine-Tuning Objectives
6.2 Metrics for Fine-Tuning Success
6.3 Practical Demonstration: Evaluating Fine-Tuning
6.4 Evaluating and Cleaning Data
Lesson 7: Case Studies
Learning objectives
7.1 Evaluating AI Agents: Task Automation and Tool Integration
7.2 Measuring Retrieval-Augmented Generation (RAG) Systems
7.3 Building and Evaluating a Recommendation Engine Using LLMs
7.4 Using Evaluation to Combat AI Drift
7.5 Time-Series Regression
Lesson 8: Summary of Evaluation and Looking Ahead
Learning objectives
8.1 When and How to Evaluate
8.2 Looking Ahead: Trends in LLM Evaluation
Summary
Evaluating Large Language Models (LLMs): Summary

[To see links please register or login]

[Image: signature.png]
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Download Now   Download Now
Download Now   Download Now


Telegram