ai-agents

Production-grade AI agent patterns with MCP integration, agentic RAG, handoff orchestration, multi-layer guardrails, observability, token economics, ROI frameworks, and build-vs-not decision guidance (modern best practices)

vasilyu1983 66 15 Updated 4mo ago

GitHub

Install

npx skillscat add vasilyu1983/ai-agents-public/ai-agents

Install via the SkillsCat registry.

SKILL.md

AI Agents Development — Production Skill Hub

Modern Best Practices (January 2026): deterministic control flow, bounded tools, auditable state, MCP-based tool integration, handoff-first orchestration, multi-layer guardrails, OpenTelemetry tracing, and human-in-the-loop controls (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

This skill provides production-ready operational patterns for designing, building, evaluating, and deploying AI agents.
It centralizes procedures, checklists, decision rules, and templates used across RAG agents, tool-using agents, OS agents, and multi-agent systems.

No theory. No narrative. Only operational steps and templates.

When to Use This Skill

Codex should activate this skill whenever the user asks for:

Designing an agent (LLM-based, tool-based, OS-based, or multi-agent).
Scoping capability maturity and rollout risk for new agent behaviors.
Creating action loops, plans, workflows, or delegation logic.
Writing tool definitions, MCP tools, schemas, or validation logic.
Generating RAG pipelines, retrieval modules, or context injection.
Building memory systems (session, long-term, episodic, task).
Creating evaluation harnesses, observability plans, or safety gates.
Preparing CI/CD, rollout, deployment, or production operational specs.
Producing any template in /references/ or /assets/.
Implementing MCP servers or integrating Model Context Protocol.
Setting up agent handoffs and orchestration patterns.
Configuring multi-layer guardrails and safety controls.
Evaluating whether to build an agent (build vs not decision).
Calculating agent ROI, token costs, or cost/benefit analysis.
Assessing hallucination risk and mitigation strategies.
Deciding when to kill an agent project (kill triggers).
For prompt scaffolds, retrieval tuning, or security depth, see Scope Boundaries below.

Scope Boundaries (Use These Skills for Depth)

Prompt scaffolds & structured outputs → ai-prompt-engineering
RAG retrieval & chunking → ai-rag
Search tuning (BM25/HNSW/hybrid) → ai-rag
Security/guardrails → ai-mlops
Inference optimization → ai-llm-inference

Default Workflow (Production)

Pick an architecture with the Decision Tree (below); default to workflow/FSM/DAG for production.
Draft an agent spec with `assets/core/agent-template-standard.md` (or `assets/core/agent-template-quick.md`).
Specify tools and handoffs with JSON Schema using `assets/tools/tool-definition.md` and `references/api-contracts-for-agents.md`.
Add retrieval only when needed; start with `assets/rag/rag-basic.md` and scale via `assets/rag/rag-advanced.md` + `references/rag-patterns.md`.
Add eval + telemetry early via `references/evaluation-and-observability.md`.
Run the go/no-go gate with `assets/checklists/agent-safety-checklist.md`.
Plan deploy/rollback and safety controls via `references/deployment-ci-cd-and-safety.md`.

Quick Reference

Agent Type	Core Control Flow	Interfaces	MCP/A2A	When to Use
Workflow Agent (FSM/DAG)	Explicit state transitions	State store, tool allowlist	MCP	Deterministic, auditable flows
Tool-Using Agent	Route → call tool → observe	Tool schemas, retries/timeouts	MCP	External actions (APIs, DB, files)
RAG Agent	Retrieve → answer → cite	Retriever, citations, ACLs	MCP	Knowledge-grounded responses
Planner/Executor	Plan → execute steps with caps	Planner prompts, step budget	MCP (+A2A)	Multi-step problems with bounded autonomy
Multi-Agent (Orchestrated)	Delegate → merge → validate	Handoff contracts, eval gates	A2A	Specialization with explicit handoffs
OS Agent	Observe UI → act → verify	Sandbox, UI grounding	MCP	Desktop/browser control under strict guardrails
Code/SWE Agent	Branch → edit → test → PR	Repo access, CI gates	MCP	Coding tasks with review/merge controls

Framework Selection (2026)

Framework	Architecture	Best For	Ease
LangGraph	Graph-based, stateful	Enterprise, compliance, auditability	Medium
OpenAI Agents SDK	Tool-centric, lightweight	Fast prototyping, OpenAI ecosystem	Easy
Google ADK	Code-first, multi-language	Gemini/Vertex AI, polyglot teams	Medium
Pydantic AI	Type-safe, graph FSM	Production Python, type safety	Medium
CrewAI	Role-based crews	Team workflows, content generation	Easiest
AutoGen	Conversational	Code generation, research	Medium
AWS Bedrock Agents	Managed infrastructure	Enterprise AWS, knowledge bases	Easy

See `references/modern-best-practices.md` for detailed framework comparison and selection guide.

Decision Tree: Choosing Agent Architecture

What does the agent need to do?
    ├─ Answer questions from knowledge base?
    │   ├─ Simple lookup? → RAG Agent (LangChain/LlamaIndex + vector DB)
    │   └─ Complex multi-step? → Agentic RAG (iterative retrieval + reasoning)
    │
    ├─ Perform external actions (APIs, tools, functions)?
    │   ├─ 1-3 tools, linear flow? → Tool-Using Agent (LangGraph + MCP)
    │   └─ Complex workflows, branching? → Planning Agent (ReAct/Plan-Execute)
    │
    ├─ Write/modify code autonomously?
    │   ├─ Single file edits? → Tool-Using Agent with code tools
    │   └─ Multi-file, issue resolution? → Code/SWE Agent (HyperAgent pattern)
    │
    ├─ Delegate tasks to specialists?
    │   ├─ Fixed workflow? → Multi-Agent Sequential (A → B → C)
    │   ├─ Manager-Worker? → Multi-Agent Hierarchical (Manager + Workers)
    │   └─ Dynamic routing? → Multi-Agent Group Chat (collaborative)
    │
    ├─ Control desktop/browser?
    │   └─ OS Agent (Anthropic Computer Use + MCP for system access)
    │
    └─ Hybrid (combination of above)?
        └─ Planning Agent that coordinates:
            - Tool-using for actions (MCP)
            - RAG for knowledge (MCP)
            - Multi-agent for delegation (A2A)
            - Code agents for implementation

Protocol Selection:

Use MCP for: Tool access, data retrieval, single-agent integration
Use A2A for: Agent-to-agent handoffs, multi-agent coordination, task delegation

Core Concepts (Vendor-Agnostic)

Control Flow Options

Reactive: direct tool routing per user request (fast, brittle if unbounded).
Workflow (FSM/DAG): explicit states and transitions (default for deterministic production).
Planner/Executor: plan with strict budgets, then execute step-by-step (use when branching is unavoidable).
Orchestrated multi-agent: separate roles with validated handoffs (use when specialization is required).

Memory Types (Tradeoffs)

Short-term (session): cheap, ephemeral; best for conversational continuity.
Episodic (task): scoped to a case/ticket; supports audit and replay.
Long-term (profile/knowledge): high risk; requires consent, retention limits, and provenance.

Failure Handling (Production Defaults)

Classify errors: retriable vs fatal vs needs-human.
Bound retries: max attempts, backoff, jitter; avoid retry storms.
Fallbacks: degraded mode, smaller model, cached answers, or safe refusal.

Do / Avoid

Do keep state explicit and serializable (replayable runs).
Do enforce tool allowlists, scopes, and idempotency for side effects.
Do log traces/metrics for model calls and tool calls (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).

Avoid

Avoid runaway autonomy (unbounded loops or step counts).
Avoid hidden state (implicit memory that cannot be audited).
Avoid untrusted tool outputs without validation/sanitization.

Navigation: Economics & Decision Framework

Should You Build an Agent?

Build vs Not Decision Framework - `references/build-vs-not-decision.md`
- 10-second test (volume, cost, error tolerance)
- Red flags and immediate disqualifiers
- Alternatives to agents (usually better)
- Full decision tree with stage gates
- Kill triggers during development and post-launch
- Pre-build validation checklist

Agent ROI & Token Economics

Agent Economics - `references/agent-economics.md`
- Token pricing by model (January 2026)
- Cost per task by agent type
- ROI calculation formula and tiers
- Hallucination cost framework and mitigation ROI
- Investment decision matrix
- Monthly tracking dashboard

Navigation: Core Concepts & Patterns

Governance & Maturity

Agent Maturity & Governance - `references/agent-maturity-governance.md`
- Capability maturity levels (L0-L4)
- Identity & policy enforcement
- Fleet control and registry management
- Deprecation rules and kill switches

Modern Best Practices

Modern Best Practices - `references/modern-best-practices.md`
- Model Context Protocol (MCP)
- Agent-to-Agent Protocol (A2A)
- Agentic RAG (Dynamic Retrieval)
- Multi-layer guardrails
- LangGraph over LangChain
- OpenTelemetry for agents

Context Management

Context Engineering - `references/context-engineering.md`
- Progressive disclosure
- Session management
- Memory provenance
- Retrieval timing
- Multimodal context

Core Operational Patterns

Operational Patterns - `references/operational-patterns.md`
- Agent loop pattern (PLAN → ACT → OBSERVE → UPDATE)
- OS agent action loop
- RAG pipeline pattern
- Tool specification
- Memory system pattern
- Multi-agent workflow
- Safety & guardrails
- Observability
- Evaluation patterns
- Deployment & CI/CD

Navigation: Protocol Implementation

MCP Practical Guide - `references/mcp-practical-guide.md`
Building MCP servers, tool integration, and standardized data access
MCP Server Builder - `references/mcp-server-builder.md`
End-to-end checklist for workflow-focused MCP servers (design → build → test)
A2A Handoff Patterns - `references/a2a-handoff-patterns.md`
Agent-to-agent communication, task delegation, and coordination protocols
Protocol Decision Tree - `references/protocol-decision-tree.md`
When to use MCP vs A2A, decision framework, and selection criteria

Navigation: Agent Capabilities

Agent Operations - `references/agent-operations-best-practices.md`
Action loops, planning, observation, and execution patterns
RAG Patterns - `references/rag-patterns.md`
Contextual retrieval, agentic RAG, and hybrid search strategies
Memory Systems - `references/memory-systems.md`
Session, long-term, episodic, and task memory architectures
Tool Design & Validation - `references/tool-design-specs.md`
Tool schemas, validation, error handling, and MCP integration

Skill Packaging & Sharing

Skill Lifecycle - `references/skill-lifecycle.md`
Scaffold, validate, package, and share skills with teams (Slack-ready)
API Contracts for Agents - `references/api-contracts-for-agents.md`
Request/response envelopes, safety gates, streaming/async patterns, error taxonomy
Multi-Agent Patterns - `references/multi-agent-patterns.md`
Manager-worker, sequential, handoff, and group chat orchestration
OS Agent Capabilities - `references/os-agent-capabilities.md`
Desktop automation, UI grounding, and computer use patterns
Code/SWE Agents - `references/code-swe-agents.md`
SE 3.0 paradigm, autonomous coding patterns, SWE-Bench, HyperAgent architecture

Navigation: Production Operations

Evaluation & Observability - `references/evaluation-and-observability.md`
OpenTelemetry GenAI, metrics, LLM-as-judge, and monitoring
Deployment, CI/CD & Safety - `references/deployment-ci-cd-and-safety.md`
Multi-layer guardrails, HITL controls, NIST AI RMF, production checklists
Agent Debugging Patterns - `references/agent-debugging-patterns.md`
Systematic debugging for agentic systems: trace analysis, tool call failures, loop detection, state corruption
Voice & Multimodal Agents - `references/voice-multimodal-agents.md`
Voice-first and multimodal agent patterns: speech pipelines, vision grounding, cross-modal orchestration
Guardrails Implementation - `references/guardrails-implementation.md`
Multi-layer guardrail patterns: input/output validation, content filtering, PII detection, cost caps

Navigation: Templates (Copy-Paste Ready)

Checklists

Agent Design & Safety Checklist - `assets/checklists/agent-safety-checklist.md`
Go/No-Go safety gate: permissions, HITL triggers, eval gates, observability, rollback

Core Agent Templates

Standard Agent Template - `assets/core/agent-template-standard.md`
Full production spec: memory, tools, RAG, evaluation, observability, safety
Specialized Agent Template - `assets/core/agent-template-specialized.md`
Domain-specific agents with custom capabilities and constraints
Quick Agent Template - `assets/core/agent-template-quick.md`
Minimal viable agent for rapid prototyping

RAG Templates

Basic RAG - `assets/rag/rag-basic.md`
Simple retrieval-augmented generation pipeline
Advanced RAG - `assets/rag/rag-advanced.md`
Contextual retrieval, reranking, and agentic RAG patterns
Hybrid Retrieval - `assets/rag/hybrid-retrieval.md`
Semantic + keyword search with BM25 fusion

Tool Templates

Tool Definition - `assets/tools/tool-definition.md`
MCP-compatible tool schemas with validation and error handling
Tool Validation Checklist - `assets/tools/tool-validation-checklist.md`
Testing, security, and production readiness checks

Multi-Agent Templates

Manager-Worker Template - `assets/multi-agent/manager-worker-template.md`
Orchestration pattern with task delegation and result aggregation
Evaluator-Router Template - `assets/multi-agent/evaluator-router-template.md`
Dynamic routing with quality assessment and domain classification

Service Layer Templates

FastAPI Agent Service - `../dev-api-design/assets/fastapi/fastapi-complete-api.md`
Auth, pagination, validation, error handling; extend with model lifespan loads, SSE, background tasks

External Sources Metadata

Curated References - `data/sources.json`
Authoritative sources spanning standards, protocols, and production agent frameworks

Shared Utilities (Centralized patterns — extract, don't duplicate)

../software-clean-code-standard/utilities/llm-utilities.md — Token counting, streaming, cost estimation
../software-clean-code-standard/utilities/error-handling.md — Effect Result types, correlation IDs
../software-clean-code-standard/utilities/resilience-utilities.md — p-retry v6, circuit breaker for API calls
../software-clean-code-standard/utilities/logging-utilities.md — pino v9 + OpenTelemetry integration
../software-clean-code-standard/utilities/observability-utilities.md — OpenTelemetry SDK, tracing, metrics
../software-clean-code-standard/utilities/testing-utilities.md — Test factories, fixtures, mocks
../software-clean-code-standard/references/clean-code-standard.md — Canonical clean code rules (CC-*) for citation

Trend Awareness Protocol

IMPORTANT: When users ask recommendation questions about AI agents, you MUST use WebSearch to check current trends before answering.
If WebSearch is unavailable, use data/sources.json + any available web browsing tools, and explicitly state what you verified vs assumed.

Trigger Conditions

"What's the best agent framework for [use case]?"
"What should I use for [multi-agent/tool use/orchestration]?"
"What's the latest in AI agents?"
"Current best practices for [agent architecture/MCP/A2A]?"
"Is [LangGraph/CrewAI/AutoGen] still relevant in 2026?"
"[Agent framework A] vs [Agent framework B]?"
"Best way to build [coding agent/RAG agent/OS agent]?"
"What MCP servers are available?"

Required Searches

Search: "AI agent frameworks best practices 2026"
Search: "[LangGraph/CrewAI/AutoGen/Semantic Kernel] comparison 2026"
Search: "AI agent trends January 2026"
Search: "MCP servers available 2026"

What to Report

After searching, provide:

Current landscape: What agent frameworks are popular NOW
Emerging trends: New patterns gaining traction (MCP, A2A, agentic coding)
Deprecated/declining: Frameworks or patterns losing relevance
Recommendation: Based on fresh data, not just static knowledge

Example Topics (verify with fresh search)

Agent frameworks (LangGraph, CrewAI, AutoGen, Semantic Kernel, Pydantic AI)
MCP ecosystem (available servers, new integrations)
Agentic coding (Codex CLI, Claude Code, Cursor, Windsurf, Cline)
Multi-agent patterns (hierarchical, collaborative, competitive)
Tool use protocols (MCP, function calling)
Agent evaluation (SWE-Bench, AgentBench, GAIA)
OS/computer use agents (computer-use APIs, browser automation)

Related Skills

This skill integrates with complementary skills:

Core Dependencies

`../ai-llm/` - LLM patterns, prompt engineering, and model selection for agents
`../ai-rag/` - Deep RAG implementation: chunking, embedding, reranking
`../ai-prompt-engineering/` - System prompt design, few-shot patterns, reasoning strategies

Production & Operations

`../qa-observability/` - OpenTelemetry, metrics, distributed tracing
`../software-security-appsec/` - OWASP Top 10, input validation, secure tool design
`../ops-devops-platform/` - CI/CD pipelines, deployment strategies, infrastructure

Supporting Patterns

`../dev-api-design/` - REST/GraphQL design for agent APIs and tool interfaces
`../ai-mlops/` - Model deployment, monitoring, drift detection
`../qa-debugging/` - Agent debugging, error analysis, root cause investigation

Usage pattern: Start here for agent architecture, then reference specialized skills for deep implementation details.

Usage Notes

Modern Standards: Default to MCP for tools, agentic RAG for retrieval, handoff-first for multi-agent
Lightweight SKILL.md: Use this file for quick reference and navigation
Drill-down resources: Reference detailed resources for implementation guidance
Copy-paste templates: Use templates when the user asks for structured artifacts
External sources: Reference data/sources.json for authoritative documentation links
No theory: Never include theoretical explanations; only operational steps

Key Modern Migrations

Traditional → Modern:

Custom APIs → Model Context Protocol (MCP)
Static RAG → Agentic RAG with contextual retrieval
Ad-hoc handoffs → Versioned handoff APIs with JSON Schema
Single guardrail → Multi-layer defense (5+ layers)
LangChain agents → LangGraph stateful workflows
Custom observability → OpenTelemetry GenAI standards
Model-centric → Context engineering-centric

AI-Native SDLC Template

Use `assets/agent-template-ainative-sdlc.md` for the Delegate → Review → Own runbook (guardrails + outputs checklist).

ai-agents

Install

AI Agents Development — Production Skill Hub

When to Use This Skill

Scope Boundaries (Use These Skills for Depth)

Default Workflow (Production)

Quick Reference

Framework Selection (2026)

Decision Tree: Choosing Agent Architecture

Core Concepts (Vendor-Agnostic)

Control Flow Options

Memory Types (Tradeoffs)

Failure Handling (Production Defaults)

Do / Avoid

Navigation: Economics & Decision Framework

Should You Build an Agent?

Agent ROI & Token Economics

Navigation: Core Concepts & Patterns

Governance & Maturity

Modern Best Practices

Context Management

Core Operational Patterns

Navigation: Protocol Implementation

Navigation: Agent Capabilities

Skill Packaging & Sharing

Navigation: Production Operations

Navigation: Templates (Copy-Paste Ready)

Checklists

Core Agent Templates

RAG Templates

Tool Templates

Multi-Agent Templates

Service Layer Templates

External Sources Metadata

Shared Utilities (Centralized patterns — extract, don't duplicate)

Trend Awareness Protocol

Trigger Conditions

Required Searches

What to Report

Example Topics (verify with fresh search)

Related Skills

Core Dependencies

Production & Operations

Supporting Patterns

Usage Notes

Key Modern Migrations

AI-Native SDLC Template

Categories

Install

Recommended Skills