Skip to content

Agent System

Foundry runs AI through three distinct services, each with a specific responsibility. The system uses a three-tier model deployment strategy and a structured context assembly pipeline to ensure agents reason about full project context.

Directory: agent-service/

A stateless Express 5 sidecar with 7 structured output endpoints. Runs on localhost:3001 during local development. Uses the Claude Agent SDK streaming interface.

  • Auth: Three-priority detection — manual config, environment variable, Claude Code OAuth
  • Database access: None by design. Results returned to callers only.
  • Cost tracking: Per-request middleware logs model, tokens, and estimated cost.

Agent Worker (Cloudflare Worker — production)

Section titled “Agent Worker (Cloudflare Worker — production)”

Directory: agent-worker/

The production replacement for agent-service. A Cloudflare Worker built with Hono and the Anthropic SDK. Deployed at https://foundry-agent-worker.<account>.workers.dev.

  • Auth: Bearer token via AGENT_SERVICE_SECRET header. Unauthenticated requests return 401.
  • Routes: Same structured analysis endpoints as agent-service (/analyze-requirement, /analyze-task-subtasks, /health, etc.)

Routing logic in convex/lib/agentServiceClient.ts switches between services based on environment:

// Local: no auth header
const url = process.env.AGENT_SERVICE_URL; // http://localhost:3001
// Production: bearer auth
const url = process.env.AGENT_SERVICE_URL; // https://foundry-agent-worker.<acct>.workers.dev
const headers = {
Authorization: `Bearer ${process.env.AGENT_SERVICE_SECRET}`,
};

Sandbox Worker (Cloudflare Worker + Durable Objects + Docker)

Section titled “Sandbox Worker (Cloudflare Worker + Durable Objects + Docker)”

Directory: sandbox-worker/

Provisions ephemeral AI coding environments scoped to individual tasks. This is the flagship execution engine.

Architecture stack:

  • Cloudflare Worker — lightweight request router
  • SessionStore Durable Object — one per session, manages container lifecycle and SQLite storage
  • Docker Container — runs Claude Code SDK with bypassPermissions mode

The 10-stage provisioning pipeline:

StageWhat happens
containerProvisionDocker container allocated on Cloudflare
systemSetupBase packages and environment configured
authSetupGit credentials and API keys injected
claudeConfigClaude Code SDK configuration written
gitCloneRepository cloned into the container
depsInstallProject dependencies installed
mcpInstallMCP servers configured for project-specific tooling
workspaceCustomizationDotfiles, hooks, custom scripts applied
healthCheckVerify Claude Code SDK responds and git works
readySession available for execution

Runtime modes: idle, executing, interactive (multi-turn chat), hibernating. TTL range: 5-60 minutes with Durable Object alarm-based expiration.

Foundry routes AI requests to different Claude models based on task complexity and cost requirements.

ModelCodeUse CaseRationale
Claude Opus 4.6claude-opus-4-6-20250219Document analysisHighest capability for complex multi-type extraction from unstructured documents
Claude Sonnet 4.5 v2claude-sonnet-4-5-20250929Agent service routes, health scoring, subtask generationBalanced cost/capability for structured output tasks
Claude Sonnet 4.5claude-sonnet-4-5-20250514Core skill execution (executeSkill)Standard execution tier for sandbox operations

refreshModelCache fetches the live Anthropic models list (/v1/models) and caches in Convex with 24-hour TTL. The UI model selector shows actual available models, not a hardcoded list.

Every AI invocation assembles structured XML context from five layers before calling Claude. This pipeline is engagement-type agnostic — it works for migrations, greenfield builds, integrations, and product development.

  1. Program context (~200 tokens) — program type, phase, status, source/target platforms.

  2. Requirements (~500-2K tokens) — filtered by workstream, with dependency graph and status.

  3. Skill instructions (~2-10K tokens) — full skill content with domain, version metadata, and execution rules.

  4. Recent execution history (~300-800 tokens) — last 5 agent runs with review status. Enables feedback-loop-aware prompting.

  5. Task prompt — the specific task with XML tags, acceptance criteria, and repository structure.

<program_context>
Program: AcmeCorp Migration
Phase: build
Source: Magento
Target: Salesforce B2B Commerce
</program_context>
<requirements>
REQ-042: Product catalog sync (status: in_progress)
REQ-043: Price book mapping (status: draft, depends_on: REQ-042)
</requirements>
<skill_instructions>
Domain: integration
Version: 3
Content: [full skill text with implementation rules]
</skill_instructions>
<recent_executions>
Run 1: REQ-042 subtask 3 — accepted (2 days ago)
Run 2: REQ-042 subtask 4 — rejected, "missing error handling" (1 day ago)
</recent_executions>
<task>
Implement product catalog sync endpoint with error handling.
Acceptance criteria: [...]
Repository structure: [injected file tree]
</task>

Foundry uses Anthropic’s prompt caching (cache_control: { type: "ephemeral" }) on static context blocks to achieve ~90% cost reduction on repeated context.

const response = await anthropic.messages.create({
model: "claude-sonnet-4-5-20250929",
system: [
{
type: "text",
text: programContext, // static across calls
cache_control: { type: "ephemeral" },
},
{
type: "text",
text: skillInstructions, // static within a session
cache_control: { type: "ephemeral" },
},
],
messages: [{ role: "user", content: taskPrompt }],
});

The first call pays full price for the system prompt. Subsequent calls within the cache window (~5 minutes) read from cache at 10% of the original cost.

Select AI routes use Claude’s extended thinking for complex reasoning tasks. The thinking budget controls how long the model reasons before generating output.

RouteThinking BudgetUse Case
Task decomposition8,000 tokensBreaking requirements into implementation tasks
Gate evaluation7,000 tokensAssessing sprint gate readiness
Risk evaluation6,000 tokensAnalyzing risk factors and impacts
Video segment analysis6,000 tokensExtracting structured findings from recordings

LLM output varies in casing and formatting. Foundry uses Zod helpers that normalize before validation:

ai/schemas.ts
const lenientEnum = (values: string[]) =>
z.string().transform(v => v.toLowerCase().replace(/\s+/g, "_"));

This prevents crashes when Claude returns "High" instead of "high" or "In Progress" instead of "in_progress".

Subtask generation parses partial JSON via brace-depth tracking. Each completed subtask is inserted into Convex as it arrives in the stream. Users see subtasks appear one by one in real time rather than waiting for the full response.

getRepoStructureForProgram() fetches the live GitHub file tree and injects it into task decomposition prompts. Agents know the actual codebase structure — file paths, directory organization — not abstract requirements alone.

During document analysis, existing requirement titles are injected as an <existing-requirements> XML block. Claude classifies each finding as new, update, or duplicate rather than extracting requirements that already exist.

ModuleTriggerModel TierOutput
Document analysisDocument uploadOpus 4.6Requirements, risks, integrations, decisions
Task decompositionRequirement viewSonnet 4.5 v2Tasks with acceptance criteria, story points
Subtask generationTask executionSonnet 4.5 v2Scoped subtasks with complexity scores
Sprint planningSprint viewSonnet 4.5 v2Capacity-aware task recommendations
Gate evaluationGate detailSonnet 4.5 v2Readiness score (0-100%), blockers
Risk assessmentOn-demandSonnet 4.5 v2Risk identification, escalations, impacts
Health scoringDaily cronSonnet 4.5 v25-factor workstream health scores
Skill executionSandboxSonnet 4.5Code generation in isolated containers
PR descriptionSandbox pushSonnet 4.5 v2AI-generated PR description from diff
Code reviewGitHub commentSonnet 4.5 v2Context-aware review posted to PR

Every AI call tracks token usage via extractTokenUsage() and calculates cost. Records are stored in aiUsageRecords for billing and observability. The Agent Activity dashboard groups operations by requirement, enabling tracing from intake through implementation.