Agent Evaluation Framework
Comprehensive Claude Code agent evaluation framework with multi-dimensional scoring, LLM-as-Judge mode, and research-backed performance variance analysis
433NeoLabHQ
AI & ML
Developer Tools
Multi-Perspective Critique
Multi-perspective review system using Multi-Agent Debate and LLM-as-Judge patterns with 3 specialized judges, debate rounds, and consensus building
433NeoLabHQ
AI & ML
Developer Tools
Execute and Judge Loop
Single-task execution with LLM-as-Judge verification in an iterative loop, supporting auto-retry and strict orchestrator role separation
433NeoLabHQ
AI & ML
Developer Tools
Spec-Driven Implement
Implementation system driven by task specs with LLM-as-Judge auto-verification, iterative repair, breakpoint resume, and human-in-the-loop checkpoints
433NeoLabHQ
Developer Tools
Productivity
LLM-as-Judge Evaluator
Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments
433NeoLabHQ
AI & ML
Developer Tools