Skip to content

LLM Judge Standard

Core Rule

No agent verifies its own work. Judge must be a different model.

Roster

  • gemini-2.5-pro: deep/security/holes β€” default go-to (use gemini-2.5-pro in CLI, NOT gemini-3.1-pro)
  • gemini-2.5-flash: fast holistic QA
  • Ollama local (deepseek-r1): adversarial red-team
  • Delegate subagent (fresh context): spec compliance

Escalation

Always escalate security β†’ gemini-2.5-pro (deepest audit model available in CLI)

Integration Points

Patched into: requesting-code-review, gemini-cli-qa, subagent-driven-development

  • [[Hermes Infrastructure]]
  • [[Hermes Vision]]