E

Eval Driven Dev

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a u...

Home/Ai-ml/Eval Driven Dev

About this skill

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating

View on GitHub

GitHub Stats

Stars
Forks
Last Update
License
Other
Version
1.0.0

Categories

Features

Related Skills

More from Ai-ml