Evaluation
Creating Evals: How to build a "Golden Dataset."
Common Metrics: Precision, Recall, MRR (Mean Reciprocal Rank), NDCG.
LLM as a Judge: Using GPT-4 to grade RAG outputs (RAGAS framework).
Unit Testing for RAG: CI/CD for knowledge pipelines.
Creating Evals: How to build a "Golden Dataset."
Common Metrics: Precision, Recall, MRR (Mean Reciprocal Rank), NDCG.
LLM as a Judge: Using GPT-4 to grade RAG outputs (RAGAS framework).
Unit Testing for RAG: CI/CD for knowledge pipelines.