Agentic Engineering Patterns¶
Practical patterns for getting reliable output from coding agents — from Simon Willison's reference-codebase approach to the self-validation loop.
Pattern 1: Reference Codebase > Detailed Spec¶
Problem: Describing what you want in natural language is lossy. The agent guesses the gaps.
Solution: Give it an existing codebase that does something similar and say "imitate how X works."
Example (Simon Willison):
"Update blog-to-newsletter.html to include beats that have descriptions — similar to how the Atom everything feed on the blog works"
The agent read the existing Atom feed code, inferred the filtering logic (beats with non-empty note column, non-draft), and produced exactly the right SQL UNION clause.
AE Application: "Build this staging model like the stg_orders model — same grain, same key conventions, same test coverage" instead of describing grain, keys, and tests in prose.
Pattern 2: Self-Validation Loop¶
Problem: Agent-generated code is only as trustworthy as the agent's confidence. You need a test harness.
Solution: Build verification into the prompt. Give the agent:
1. A way to run the output (python -m http.server, dbt compile)
2. A way to compare against expected output (uvx rodney browser automation, diff against prod)
3. Authority to iterate if validation fails
Example (Simon Willison):
"Run it with python -m http.server and use
uvx rodney --helpto test it — compare what shows up in the newsletter with what's on the homepage of simonwillison.net"
AE Application: "Run dbt compile and dbt test on the generated model. If any tests fail, read the error and fix the model."
Pattern 3: The /tmp Clone¶
Problem: If you give an agent access to your codebase AND a reference codebase, it may accidentally mix code from both.
Solution: Clone the reference repo to /tmp. The agent reads it for patterns but can't contaminate your project.
Example: Clone simonw/simonwillisonblog from github to /tmp for reference
AE Application: Clone a reference dbt project (or your own project as a snapshot) to /tmp so the agent can read model patterns, macros, and conventions without modifying production code.
Pattern 4: Specialised Swarms > Generalist Agent¶
Problem: One agent trying to do everything produces mediocre everything.
Solution: Task decomposition with specialised subagents: - One agent per concern (modeling, documentation, testing, lineage) - Narrower scope → deeper reliability - Orchestrator coordinates, specialists execute
Source: Meta's 50-agent pipeline mapper (see [[AI-Agents-in-Data-Engineering]])
Pattern 5: Short, Context-Rich Prompts¶
Problem: Long specifications are ignored or partially followed. Short prompts lack direction.
Solution: Three sentences, each carrying dense context: 1. What exists (reference codebase, existing patterns) 2. What to build (target file, pattern to imitate) 3. How to verify (run command, comparison target)
Simon Willison got a complex feature implemented with 3 sentences because each carried the weight of a full codebase's worth of context.
Pattern 6: Context as Code¶
Problem: Agents fail when critical context lives in dashboards, docs, Slack, Notion, warehouse schemas, and human heads rather than in a reviewable runtime surface.
Solution: Maintain a context layer as versioned files: - hard semantics: schemas, joins, metrics, grains, executable YAML/SQL - soft semantics: business docs, rules, exceptions, methodology notes - validation: compile/query preview before execution - governance: git review, owners, freshness assumptions, correction capture
Example: Kaelio's ktx pattern: status → semantic-layer search → validate → wiki search → serve to agent via MCP/CLI.
AE Application: Before asking an agent to answer a revenue question, route it through approved metric definitions and business methodology notes. The agent should discover “ARR methodology” and compile the governed SQL, not guess at revenue columns.
Anti-Patterns¶
| Anti-Pattern | Why It Fails | Fix |
|---|---|---|
| Writing long natural-language specs | Lossy, agent guesses gaps | Reference codebase |
| No validation step | Agent can't tell if output is correct | Self-testing loop |
| Agent writes directly to production | Can't undo bad output | /tmp clone, then review |
| One generalist agent | Mediocre at everything | Specialised subagents |
| "Just trust the output" | Agent-generated SQL can be subtly wrong | dbt compile + dbt test before merge |
| Raw warehouse connection with no context layer | Agent guesses field names, joins, grain, and business meaning | Context-as-code with hard + soft semantics |
dbt-Specific Pattern Combinations¶
Combining these patterns for analytics engineering workflows:
1. "Clone our dbt project to /tmp as reference" ← /tmp clone
2. "Build stg_returns like stg_orders — same grain, ← Reference codebase
same keys, same testing conventions" + imitation
3. "Run dbt compile and dbt test. Fix any failures ← Self-validation
and re-test until all tests pass"
This three-sentence prompt gives the agent everything it needs: the existing codebase for patterns, the specific target, and a validation loop.
Related¶
- [[AI-Agents-in-Data-Engineering]] — enterprise patterns and governance
- [[Agentic-Analytics-Engineering]] — the career transition
- Source: Simon Willison — Agentic Engineering Patterns
- Source: SSP — Beyond the Semantic Layer: Building a Context Layer for the Agentic Era
Pattern 7: Skill Guardrails as Reliability Contract¶
Source: 4 Lines You Should Include in Your Claude Skill (2026-06-15).
Problem: Skills and prompts often sound authoritative even when context is missing, thresholds are undefined, or the analysis is outside the source material.
Solution: Treat each skill as a reliability contract. For analytics/reporting skills, require the agent to (1) state missing context, (2) define significant thresholds before interpreting results, (3) use confidence qualifiers, and (4) name analysis limits instead of papering over them.
Hermes application: Review existing Hermes skills for explicit missing-context, threshold, confidence, and limitation language, especially skills that produce analysis, recommendations, or user-facing reports.
Pattern 8: Question Parser before Agent Execution¶
Source: "What the Question Parser Extracts from a User String" (2026-06-17).
Problem: Agents execute too early when the user request has ambiguous entities, scope, output shape, or decomposition boundaries.
Solution: Insert a deterministic/question-parser step before expensive agent work. Extract:
- keywords/entities
- scope and exclusions
- expected answer shape
- decomposition into sub-questions
- whether one clarification question is required
Hermes application: This maps directly to the LocalStack context goal: convert vague product/work questions into scoped briefs using LocalStack glossary, architecture maps, command/test cheat sheets, sharp edges, and customer-language context before asking Claude/Codex to act.
Pattern 9: Security Boundaries for Imperfectly Aligned Agents¶
Source: Google DeepMind — How we're securing internal systems against increasingly capable and imperfectly aligned AI (2026-06-22).
As agents become more capable, internal systems need controls that assume imperfect alignment and imperfect instruction following. The durable pattern is defense in depth around the agent, not trust in the prompt:
- least-privilege tool access
- environment and secret isolation
- audit trails for tool calls and side effects
- explicit approval gates for sensitive operations
- evals/red-team probes for misuse paths
- separation between maker and verifier
Hermes implication: treat ANDON, QA records, tool allowlists, and independent review as part of the safety architecture, not bureaucracy. Keep this as internal agent-safety context; do not export private Hermes-specific details into external packs.
2026-06-13 — AI OSS tool repo goes archived over night after raising $7.3M Seed¶
- Source: unknown
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable technical story suitable for synthesis into the context library.
Summary: AI tool project archived immediately post-funding.
Reusable context: What happened The TensorZero AI OSS tool repository was unexpectedly archived on June 12, 2026, shortly after raising $7.3M in seed funding. This sudden move is speculated to be a strategic pivot to a closed-source model or an acquisition.
Why it matters This incident underscores the inherent risks for analytics engineers and data platforms relying on venture-backed open-source AI/ML tooling. It highlights the potential for rapid discontinuation or privatization of critical LLMOps infrastructure, which can disrupt development and impact long-term operational stability.
What to do Prioritize evaluation of AI/ML and LLMOps tools based on their long-term sustainability, community governance, and clear support commitments, rather than solely on recent funding or initial hype.
2026-06-13 — https://www.ssp.sh/blog/how-to-use-ai-with-de-wes-mckinney/¶
- Source: slack-intake
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable technical story suitable for synthesis into the context library.
- Raw source:
/home/adam/.hermes/context-inbox/raw/intake/2026-06-13/https-www-ssp-sh-blog-how-to-use-ai-with-de-wes-mckinney-18658954d088.md
Summary: This series interviews real practitioners to extract the patterns behind how they actually use AI in their data work today. This is the second interview in ‘How to use AI with DE’, and this time we have none other than Wes McKinney.
Creator of Pandas, probably the most widely used data analysis library for Python, Wes has shaped the era of data and is co-creator of Apache Arrow. He also created Ibis to address these issues with a different approach to Python dataframe libraries, by decoupling t
Reusable context: This series interviews real practitioners to extract the patterns behind how they actually use AI in their data work today. This is the second interview in ‘How to use AI with DE’, and this time we have none other than Wes McKinney.
Creator of Pandas, probably the most widely used data analysis library for Python, Wes has shaped the era of data and is co-creator of Apache Arrow. He also created Ibis to address these issues with a different approach to Python dataframe libraries, by decoupling the dataframe API from the backend implementation.
The article is structured in four parts:(1)how to trust the outcome,(2)knowing what not to build, factoring in cost-per-token among others,(3)accountability of agents and the code they generate, and(4)philosophizing about the future of agentic engineering.
Besides creating the most popular dataframe libraries used by most data people, Wes McKinney now focuses full time on agentic engineering with his newly founded companyKenn Software, which focuses on the promise of building a new stack of development and knowledge systems for the agentic era. He’s also doing AI and Python atPosit, where they work on adata science IDE. He’s a part-timeinvestorin various startups.
Wes has been running Claude Code, Codex, and Gemini CLI for months. Thousands of sessions, hundreds of thousands of messages. He has released multiple tools that help the agentic work (more on this later), and he is at the forefront of what’s going on with his recent blog posts about “Why he uses programming languages built for agents, not humans” andMythical Agent Month, with his recent insights into how to work with agents. Find all his takes atWes McKinney.com.
I had the pleasure of asking Wes more about these topics, and we’ll go into more details, plus many other things. Let’s get started.
We started the interview with a critical question that stands above all others in the current AI landscape, and I asked him: “Can we trust the outcome?”. What if we need
2026-06-13 — Larger Context Windows Don’t Fix RAG — So I Built a System That Does¶
- Source: unknown
- Domains: agentic_engineering, analytics_engineering, hermes_system
- Why it was promoted: High-signal durable technical story suitable for synthesis into the context library.
Summary: Critique of context windows vs improved RAG systems with proposed alternatives.
Reusable context: What happened The article highlights a key failure in current RAG systems: larger context windows do not resolve accuracy issues, particularly for analytical queries. It proposes a "QueryRouter" system that intelligently routes queries based on intent ("Computation" or "Retrieval") to address this "Error Observability Collapse."
Why it matters This is critical for analytics engineers and those working with AI/ML tooling, as it underscores that LLMs are not reliable computational engines for aggregations. Relying solely on RAG for analytical questions leads to polished but incorrect results.
What to do Evaluate implementing a query classification layer (like the proposed QueryRouter) in your AI/analytics stack to direct computational queries to deterministic engines (e.g., dbt, Snowflake) and factual retrieval queries to RAG.
2026-06-13 — Megathread Summary: I Asked Multiple Reddit Communities How to Build a Living Memory /Context Engine for Business. Here's what everyone had to say.¶
- Source: unknown
- Domains: agentic_engineering, analytics_engineering, hermes_system
- Why it was promoted: High-signal durable technical story suitable for synthesis into the context library.
Reusable context: What happened A Reddit megathread summarized community discussions on building a "living memory" or context engine for businesses, focusing on design philosophies like "Query-First Design," architectural choices such as append-only event logs and hybrid search, and memory management strategies including significance scoring.
Why it matters This research directly informs the development of advanced AI tooling and agent frameworks by providing practical insights into managing and synthesizing enterprise knowledge, which is critical for analytics engineers integrating AI with data platforms and orchestration tools.
What to do Evaluate hybrid search (vector + relational/graph) solutions and append-only event log architectures for future knowledge management systems within your data stack.
2026-06-13 — Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload¶
- Source: unknown
- Domains: agentic_engineering, hermes_system
- Why it was promoted: High-signal durable story with actionable implications for the context library.
Summary: Local-first RAG parser (Docling) for complex table data.
Reusable context: What happened IBM Research released Docling, an open-source tool for local PDF parsing, offering high-fidelity extraction of text, tables, and images, particularly beneficial for Retrieval Augmented Generation (RAG) pipelines without relying on cloud services.
Why it matters Docling addresses data privacy and compliance concerns for analytics engineers by enabling local processing of sensitive documents. It enhances developer productivity through a unified API that consistently handles various parsing engines.
What to do Evaluate Docling for your RAG pipelines, especially for scenarios requiring on-premise PDF processing and complex table extraction, to maintain data sovereignty and improve parsing quality.
2026-06-14 — Anthropic Offers Mythos Upgrade for Cyber Partners and a ‘Safe’ Version for the Rest of You - WIRED¶
- Source: WIRED
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable story with actionable implications for the context library.
Summary: Anthropic releases a 'Mythos' model upgrade for partners alongside a safety-focused version for general users.
Reusable context: What happened Anthropic released new "Mythos-class" AI models, offering an unrestricted "Mythos 5" to cyber security partners and a "Fable 5" with aggressive safety guardrails for general public and developer use. Fable 5 routes high-risk queries to a less capable model, though both show strong performance in coding and analytical tasks.
Why it matters This bifurcated release demonstrates a growing trend of specialized AI models and controlled access based on use-case, which impacts the capabilities available for AI/ML tooling and developer productivity within data platforms. The strong analytical and coding benchmarks of Fable 5 suggest immediate utility for analytics engineers.
What to do Evaluate Claude Fable 5 for its potential to automate or enhance complex analytical tasks and coding within existing data workflows.
2026-06-14 — Claude Fable Blocked - 11 Quiet Details on What’s Next¶
- Source: unknown
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable technical story suitable for synthesis into the context library.
Summary: Report on the blocking of 'Claude Fable' and details on future model development.
Reusable context: What happened Anthropic's "Fable 5" model was blocked, reportedly due to its advanced capabilities raising "difficulty" and government influence, rather than purely technical flaws. This highlights growing regulatory scrutiny in LLM development.
Why it matters Increased regulatory intervention and safety concerns will directly impact the availability, capabilities, and ethical considerations of integrating LLMs like Claude into data platforms and AI/ML tooling. This influences adoption timelines and feature sets for analytics engineers.
What to do Evaluate new Claude releases with a focus on their compliance posture and capabilities for enterprise use, especially for sensitive data processing or automated decisioning.
2026-06-14 — I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models¶
- Source: unknown
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable story with actionable implications for the context library.
Summary: Developer indexes 669 GB of media using local ML on Apple Silicon.
Reusable context: What happened A developer used local ML models on an M1 Max to index 669 GB of GoPro video, allowing for semantic search and automated clip extraction for video editing.
Why it matters This showcases the growing power of local AI/ML tooling and personal compute for processing large, unstructured datasets, offering a cost-effective alternative to cloud-based solutions for data platforms and enhancing developer productivity by automating complex tasks.
What to do Evaluate local ML frameworks (e.g., MLX, ONNX Runtime) for personal data processing workflows and integrate them into existing data pipelines where cost or privacy are concerns.
2026-06-14 — Linux 7.1¶
- Source: unknown
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable story with actionable implications for the context library.
Summary: Announcement of Linux 7.1 kernel.
Reusable context: What happened Linux Kernel 7.1 was officially released on June 14, 2026, introducing a rewritten in-kernel NTFS driver for improved performance, initial hardware enablement for AMD Zen 6 and Intel Panther Lake processors, and a new policy to manage AI-generated bug reports.
Why it matters The improved NTFS driver can enhance data processing efficiency on Linux for analytics engineers working with Windows filesystems. Hardware support for upcoming CPUs directly benefits the performance of data platforms and AI/ML workloads. The AI bug report policy signifies AI's increasing role in developer workflows, impacting AI tooling and productivity.
What to do Evaluate the new in-kernel NTFS driver in Linux Kernel 7.1 for potential performance improvements in data pipeline operations involving Windows filesystems.
2026-06-14 — [NEW FAMILY OF MODELS] Supra1.5 family just released!¶
- Source: Reddit r/LocalLLaMA
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable technical story suitable for synthesis into the context library.
Reusable context: What happened SupraLabs, under Project Chimera, released the Supra1.5 family of "Small Language Models" (SLMs). These models boast approximately 50 million parameters and are designed for high performance and efficient local inference, representing a significant advancement in ultra-compact AI.
Why it matters The Supra1.5 models are relevant to analytics engineering and data platforms due to their capability for "edge" data tasks like local SQL generation and privacy-preserving analytics. Their instant inference and improved tool-calling features can enhance developer productivity for agentic CLI assistants and offline AI applications, seamlessly integrating with existing AI/ML tooling.
What to do Evaluate Supra1.5 models for embedded AI applications within data pipelines or CLI tools, particularly for privacy-sensitive data processing and real-time developer assistance.
2026-06-14 — The Verifier Tax: Horizon-Dependent Safety–Success Tradeoffs in Tool-Using LLM Agents [R]¶
- Source: Reddit r/MachineLearning
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable story with actionable implications for the context library.
Reusable context: What happened New research introduces the "Verifier Tax," demonstrating that runtime safety checks in tool-using LLM agents consistently reduce task success rates and rarely lead to genuinely "Safe Success," especially over interaction horizons of 15-30 turns. Agents struggle significantly with recovery after a blocked action due to safety interventions.
Why it matters This is critical for AI/ML tooling and data platforms, revealing a fundamental safety-performance tradeoff in LLM agents. The "Verifier Tax" implies that current safety mechanisms often break agent reasoning, impacting the reliability and efficiency of LLM-powered automation in data workflows.
What to do When designing or evaluating LLM agent systems for analytics, prioritize frameworks that enable grounded identity verification and robust post-intervention reasoning to effectively recover from safety blocks.
2026-06-15 — Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?¶
- Source: unknown
- Domains: agentic_engineering, analytics_engineering
- Why it was promoted: High-signal durable technical story suitable for synthesis into the context library.
Summary: Community discussion on the feasibility of replacing cloud-based coding LLMs with local models.
Reusable context: What happened Hacker News discussion reveals that while some developers are experimenting, local models like Code Llama or Phind-70B are generally not yet replacing cloud models (Claude, GPT) for daily coding tasks due to significantly lower inference speeds (e.g., 0.7 tokens/sec) and inferior performance on complex optimizations.
Why it matters This directly impacts the immediate adoption strategy for integrating LLMs into analytics engineering workflows, suggesting that cloud-based solutions remain dominant for productivity-critical tasks. It also highlights the current limitations of local inferencing on commodity hardware for computationally intensive coding assistance.
What to do Continue to prioritize cloud-based LLM integrations for developer tooling, while monitoring local model performance advancements and hardware capabilities for future on-premise deployment considerations.