Eval Driven Dev

Name: Eval Driven Dev
Author: eval-driven-dev

Home/Ai-ml/Eval Driven Dev

About this skill

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating

View on GitHub

GitHub Stats

Stars

Forks

Last Update

Author

eval-driven-dev

License

Other

Version

1.0.0

Features

Related Skills

Word / DOCX

Create, inspect, and edit Microsoft Word documents and DOCX files with reliable styles, numbering, tracked changes, tables, sections, and compatibility checks. Use when (1) the task is about Word or `...

315.2kivangdavila

Ai-ml

Clanker's World

Operate Clankers World through the canonical `cw` CLI, with bundled runtime helpers, explicit Wall vs Sandbox separation, and safe room operations on `https://clankers.world`.

315.2kdecentraliser

Ai-ml

Self-Improving Agent (Proactive Self-Reflection)

Self-reflection + Self-criticism + Self-learning + Self-organizing memory. Agent evaluates its own work, catches mistakes, and improves permanently. Use before starting work and after responding to th...

315.2kself-improving

Ai-ml

Eval Driven Dev

About this skill

GitHub Stats

Categories

Features

Related Skills

Word / DOCX

Clanker's World

Self-Improving Agent (Proactive Self-Reflection)