A model, tool, executor router — architecture¶

Status: the original design vision (co-designed 2026-05-31). This document is the shape we set out to build and the reasoning behind it — kept for that rationale, not as a description of current behavior. Much has shipped (often in a simpler form); several sections below are still aspirational and are flagged inline as (FUTURE). The implemented project lives in this repo as woollama (naming settled — see naming.md).

For what actually exists today, read these instead: the Configuration reference, Conversations API, and the authoritative Roadmap scorecard (+ slice history in build-log.md). Treat code + roadmap.md as authoritative where they differ from this doc. The biggest ways the prototype realized things differently from the sketch below: - The recipe namespace shipped as woollama/<recipe> (the doc's cosmic/ examples are pre-naming). Of the "four model kinds," only raw passthrough (<provider>/<model>) and recipes (woollama/<recipe>) exist; the fabric/<pattern> and cosmic/variant kinds are aspirational, not built. - Config is three files — mcp.json (servers only: command/args/env), recipes.toml, and inferencers.toml — not the single-file layout / policy.toml sketched in §Configuration. There is no roles/features/[server] bind handling. - The bind override env var is WOOLLAMA_ADDRESS; the MCP HTTP surface is Streamable HTTP at /mcp (not /mcp/sse+/mcp/messages). - The bundled MCP servers are the hello + textops examples, not fabric-mcp/lackpy. - The built-in tool set (§Tools, "the current set") and bidirectional MCP roles are aspirational — woollama has no in-process tools today; all tools come from downstream MCP servers. - The executor axis now includes tool delegation to Claude Code (a claude-code recipe with tools → Claude runs the loop), in addition to the provider-prefix inferencer choice described below. - A stateful conversations surface (/v1/responses + /v1/conversations) was added after this doc was written and has its own design — conversations-api-design.md. woollama routes conversation handles; backends own the state (claude-resume and managed-agents today), never woollama's own store.

What it is¶

A small daemon that routes inference requests, tool calls, and executor choice between AI clients and AI backends, using two standard wire formats and inventing none of its own.

Three axes of routing, one daemon:

	what it routes	namespace	discovery
models	inference requests (raw / pattern / variant / recipe)	`<provider>/<name>`	`GET /v1/models` + MCP resources
tools	tool calls during a chat-loop	`<server>.<tool>`	MCP `tools/list`
executors	which backend handles a given model	implicit in the model name's `<provider>` prefix	per-provider config

What it is not¶

Not an inference engine. It uses other engines.
Not a UI. Cosmic clients (the panel, CLI, Claude Desktop) connect to it.
Not a tool host (except for a small built-in set). Tools live in MCP servers.
Not a fabric clone or extension. fabric is one possible pattern source.
Not opinionated about agent loops, scratchpads, or chain-of-thought — those belong in patterns or in the model.

Binding — local-only and ephemeral by default¶

Same pattern as the existing fabric subprocess wrapping: bind to a random free loopback port; persist the chosen address to $XDG_RUNTIME_DIR/<name>.addr so clients can discover it; never bind to 0.0.0.0 without explicit opt-in. The router holds API keys and routes to local resources; it must not be LAN-reachable by default.

Surfaces: - Unix socket (default for local MCP clients — the panel, the CLI): $XDG_RUNTIME_DIR/<name>.sock. No network at all. - HTTP loopback on a random free port (for OpenAI-compatible clients that need HTTP): 127.0.0.1:<random>, address persisted to $XDG_RUNTIME_DIR/<name>.addr. - LAN bind (0.0.0.0:<port>): only when explicitly configured AND with required api_key, mirroring how fabric upstream forces auth on LAN.

Override hierarchy (as built: env var, else default — the config-file bind level is part of the target design, not implemented): 1. $WOOLLAMA_ADDRESS=host:port env (explicit, highest precedence) 2. (planned) [server] bind = "0.0.0.0:8889" in config 3. Random free loopback port (default)

The persisted address file is the discovery mechanism — clients read it the same way cosmic-fabric fabric-url works today.

Two inbound surfaces (same port, path-routed)¶

GET  /v1/models                — OpenAI-compat list of all addressable models
POST /v1/chat/completions      — OpenAI-compat chat (the inference primitive)
POST /v1/embeddings            — OpenAI-compat embeddings (when needed)

POST /mcp                      — MCP over Streamable HTTP for network clients
stdio (subprocess)             — MCP over stdio for local clients

OpenAI surface: any tool that speaks OpenAI is a client without code changes. Cursor, Aider, Continue, the openai Python/JS SDKs, anything with OPENAI_API_BASE — they all just work.

MCP surface: rich clients that want tool discovery, prompts, resources, and bidirectional callbacks. The cosmic panel is one such client.

Two outbound protocols¶

MCP        ←→ tool servers (lackpy, fabric-mcp, filesystem, git, sqlite, …)
              prompts, tools, resources, callbacks
OpenAI     ←→ inference backends (Ollama native, Anthropic compat shim,
              vLLM, llama.cpp, Together, Groq, OpenRouter, …)
              chat completions with tools + streaming

These cover different concerns: - MCP is the discovery + control + tool primitive - OpenAI is the inference primitive

Composing them in the router gives orchestrated chat without inventing anything.

Four kinds of model — one namespace¶

(PARTLY BUILT) Today only two kinds exist: raw pass-through (<provider>/<model>) and recipes (woollama/<recipe>). The fabric/<pattern> and cosmic/<variant> kinds below are aspirational. (The addressing-scheme idea — one model field for all kinds — is the part that shipped.)

model: "ollama/qwen3:14b-iq4xs"     raw inferencer — pass-through
model: "anthropic/claude-opus-4-7"  raw inferencer — pass-through
model: "fabric/scribe-summarize"    pattern — fetch system prompt, route to
                                    pattern's configured inferencer
model: "cosmic/qwen3-spicy"         variant — model + sampling config bundle
model: "cosmic/deep-research"       recipe — pattern + tools + inferencer +
                                    sampling, fully orchestrated

Resolution table:

model field	client experience	router behavior
`provider/raw-model`, no tools	standard OpenAI	pass-through to backend
`provider/raw-model`, tools supplied	standard OpenAI	pass-through; tool_calls returned to client (client handles)
`fabric/pattern`, pattern is `tool_use = false`	one final answer	fetch pattern → prepend system → route to pattern's inferencer
`fabric/pattern`, pattern is `tool_use = true`	one final answer	fetch pattern → resolve tool allow-list → chat-loop with internal tool dispatch
`cosmic/variant`	standard OpenAI	resolve to underlying provider/model + apply sampling
`cosmic/recipe`	one final answer	resolve full composition → chat-loop

Mechanical dispatch on parse of the model field plus a policy lookup. No extension to OpenAI's wire format. The pattern-as-model and recipe-as-model concepts are how the router exposes "stored prompts" (which OpenAI doesn't have natively) through the OpenAI client surface.

Recipes — pre-packaged compositions¶

[recipes."deep-research"]
pattern = "fabric/scribe-look-it-up"           # system prompt source
prompt_arguments = { depth = "really" }        # template args
inferencer = "anthropic/claude-opus-4-7"       # override pattern's default
tools = ["http_get_wikipedia", "http_get_arxiv"]
sampling.temperature = 0.7
description = "Deep research with Opus + Wikipedia + arXiv. Long answers."

Recipes are first-class models. GET /v1/models lists them. Any OpenAI client can use cosmic/deep-research without knowing fabric, MCP, or tools exist.

Recipes also surface as MCP prompts for MCP clients that want the same composition through that surface.

Built-in tools (TL.0 — router-native)¶

(FUTURE) woollama has no in-process tools today — all tools come from downstream MCP servers. This router-native tool set is aspirational.

The router ships a small set of in-process tools, distinct from MCP-discovered tools. The current set:

tool	mode	notes
`http_get`	daemon	URL fetch via Jina; per-instance `allow_domains`
`read_file`	daemon	cwd-jailed UTF-8 read; per-instance `roots`
`run_shell_confirmed`	panel-confirm	argv array, no shell expansion, requires panel approval

(Plus the hygiene rules for any tool result: empty-result sentinel, schema validation, exception wrapping, truncation cap.)

These exist alongside MCP-discovered tools in the registry, namespaced the same way (cosmic.http_get_wikipedia if we want to be fully consistent — TBD).

Bidirectional MCP (TL.3)¶

(FUTURE) Not built — woollama is an MCP client to downstream servers and an MCP server to its clients, but the symmetric roles handling below is a sketch.

When a handler we're calling needs to invoke our tools (e.g., lackpy's program calls our read_file), the model is: two MCP connections, symmetric roles. The router is server-to-the-handler for callbacks, while also being client-to-the-handler for delegation. Per-handler roles = ["server", "client"] config declares the directionality. Validated end-to-end in examples/mcp-hello/probe_client.py's elicitation test.

Tool visibility (the per-client filter inversion — deferred)¶

Tools have descriptive properties (internal, conversational, research, etc.); each named client (panel, CLI, ollama-mcp, lackpy-mcp) declares what properties or explicit tools it consumes. The router applies the filter at exposure time.

This is premature until the named clients exist in config. Until then, the per-pattern/per-recipe tools = [...] allow-list is the working approximation. Real client filters land when the panel and at least one sub-inferencer are named entries.

Configuration¶

(SUPERSEDED) This is the original single-file sketch (with policy.toml, roles, features). The prototype shipped three separate files — mcp.json, recipes.toml, inferencers.toml — and there is no policy.toml or roles/features handling. See the Configuration reference for the actual shapes.

Configuration mirrors Claude Code's .mcp.json shape for familiarity, with extensions for our specific needs:

// ~/.config/<name>/mcp.json
{
  "mcpServers": {
    "fabric": {
      "command": "fabric-mcp",
      "args": ["--transport", "stdio"],
      "env": { "FABRIC_BASE_URL": "http://localhost:11434" },
      "roles": ["server"],
      "features": { "streaming": "progress-typed-events" }
    },
    "lackpy": {
      "command": "lackpy",
      "args": ["mcp"],
      "roles": ["server", "client"]            // bidirectional
    }
  },
  "inferencers": {
    "ollama":   { "url": "http://localhost:11434" },
    "anthropic":{ "url": "https://api.anthropic.com",
                  "api_key": "$ANTHROPIC_API_KEY",
                  "compat_path": "/v1" }
  }
}

Plus a separate policy.toml for recipes, variants, tool instances, and per-pattern metadata.

Vendored MCP servers (the bundle)¶

(FUTURE) Today the bundle is the hello + textops example servers, not the fabric-mcp/lackpy set described below.

The router ships with a curated bundle of MCP servers:

package	source	shape
`fabric-mcp`	fork of `ksylvan/fabric-mcp`	fixed streaming bug + variable substitution
`lackpy-mcp`	`teaguesterling/lackpy`	use as-is

Inference backends speak OpenAI-compat natively, so we don't ship wrappers for Ollama, Anthropic, etc. — the router talks to their OpenAI endpoints directly.

Forks live as their own repos (<namespace>/cosmic-mcp-fabric etc.) with clear README diffs from upstream. Sync periodically. Users can override any bundled server by setting command in mcp.json.

What this collapses from the prior cosmic-fabric prototype¶

About 60% of yesterday's tool-calling prototype:

in prototype	replaced by
`core.FabricClient` (fabric REST adapter)	the (forked) fabric-mcp server
`core.run_with_tools` against Ollama `/api/chat`	OpenAI chat-completions client + MCP tool dispatch
Bespoke socket protocol panel↔daemon	MCP over stdio
`core.assemble_prompt`	fabric-mcp's `prompts/get` (with vars in the fork)
meta.toml sidecars	`[patterns.X]` and `[recipes.X]` in cos-fab's policy
Pattern frontmatter parser	gone with meta.toml
`core.inst_to_options` (sampling-knob translation)	direct openai-compat fields in recipes

What survives: - Hygiene rules (sanitize, validate, exception-wrap) - Chat-loop shape (re-implemented against OpenAI + MCP) - Built-in tool callbacks (http_get, read_file, run_shell_confirmed) - Per-pattern tool allow-list semantics (moved to recipes)

What stays open¶

~~Naming.~~ Settled: woollama (see naming.md).
Tools as named tables instead of bare names. Will let tool entries carry per-tool metadata (description, version, deprecation) alongside the name. Deferred.
Recipe inheritance. One recipe extending another. Deferred until two recipes share enough config to motivate it.
Per-client visibility filters. Real client filters land when the panel and at least one sub-inferencer are named entries in mcp.json.

What the probe demonstrated (historical)¶

The validation probe (once at /tmp/router_probe/) is gone — its job was to prove the shape, which it did; the real implementation now lives in src/woollama/ and is far past this. Kept as a record of the original end-to-end validation.

The architecture compiled into ~200 LoC of Python:

FastAPI HTTP server with /v1/models and /v1/chat/completions
Per-request stdio connection to the hello MCP server
Two model namespaces: ollama/* (pass-through) and cosmic/* (recipe)
One hardcoded recipe (cosmic/streamer) that bundles a system prompt + the hello server's count_to tool + qwen3 as the inferencer
The full chat-loop: OpenAI client → router → Ollama with tools → MCP for tool execution → result back to Ollama → final answer back to OpenAI client

Tested end-to-end with the openai Python SDK as the client. Validates the entire architecture in code, not just in conversation.