Implementation system driven by task specs with LLM-as-Judge auto-verification, iterative repair, breakpoint resume, and human-in-the-loop checkpoints