Catalyst — A runtime for AI systems you can actually operate

01 — Thesis

Composition isn't the problem.
Operation is.

LangChain helps you wire components together. LangGraph helps you orchestrate them. Neither was designed to answer the question every enterprise eventually asks: how do we run this safely, observe it honestly, and change it without breaking everything? — The Catalyst position

Context

Every team building AI applications eventually invents the same scaffolding: prompt versioning, trace IDs, evaluation harnesses, PII redaction, cost accounting, a way to know which version of which prompt produced which answer for which user.

It gets built badly, three times per company, scattered across notebooks and side projects, glued onto whichever framework was fashionable that quarter.

Catalyst is the answer to that — built once, properly, as a runtime contract rather than a wrapper.

02 — Architecture

A small, opinionated core.
Everything else is an adapter.

Catalyst is built on hexagonal architecture. Your business logic stays in plain Python. Frameworks live at the edges. The runtime sits in the middle and enforces consistency.

Your core logic is yours

Custom chunkers, retrieval strategies, planner logic, tool routing, decision rules — none of this should depend on a framework's base classes. Catalyst gives you stable contracts (Retriever, QueryFlow, AgentFlow) so your code outlives any specific dependency.

ii.

The runtime owns governance

Execution context, policy enforcement, telemetry emission, prompt resolution, evaluation hooks — these are runtime concerns, not framework concerns. Centralizing them means one place to fix bugs, one schema to query, one model your security team can review.

iii.

Frameworks plug in through adapters

LangChain middleware, LangGraph runtime context, raw provider SDKs — all reachable through thin translators that satisfy Catalyst's contracts. Swap a framework without rewriting your application. Run two frameworks side-by-side without duplicating governance.

04 — Example

One invocation,
every concern accounted for.

A knowledge agent answering a policy question. The LangChain chain inside is unchanged. Catalyst wraps the invocation with everything you'd otherwise build by hand.

app/main.py python

from catalyst import Runtime, ExecutionContext, PromptRef
from catalyst.policy import EnterprisePolicy
from app.agents import knowledge_agent   # plain LangChain inside

runtime = Runtime(
    telemetry=otel_sink,
    policy=EnterprisePolicy(pii=True, injection=True),
    evaluator=production_evals,
)

ctx = ExecutionContext(
    app_name="support-assistant",
    tenant_id="acme",
    user_id="u-1029",
)

result = runtime.run(
    flow=knowledge_agent,
    input={"question": "What is the carry-forward policy?"},
    prompt=PromptRef("hr_rag", version="2.3.1"),
    ctx=ctx,
)

result.output      # the grounded answer
result.evals       # {groundedness: 0.94, pii_leakage: 0.0, ...}
result.trace.id    # auditable, queryable, attributable

Listing 1 — A single runtime.run() handles prompt resolution, policy checks, span emission, and post-hoc evaluation.

05 — Contrast

What changes, in practice.

Before

Prompts in code, version controlled by git blame.

Logging bolted on per-app, every team writes its own format.

PII redaction added after the first incident.

Evaluation is a Jupyter notebook someone ran last quarter.

Swapping LangChain for raw SDKs means rewriting the app.

After

Prompts are registered artifacts. Versions, owners, diffs.

One telemetry schema. Query it once, answer everywhere.

Policy is declared at the runtime; enforced on every call.

Evaluation runs on every invocation. Trends are visible.

Frameworks become adapters. Swap them; your app doesn't notice.

06 — Scope

What people build with it.

Knowledge agents

Grounded answers over internal documentation, with citations, audit trails, and groundedness scoring on every response.

RAG · Retrieval · Eval

Planner agents

Multi-step decomposition with explicit tool routing, retry policy, and full execution traces from goal to outcome.

LangGraph · State

Tool-using agents

Agents that interact with internal APIs, databases, and services — under policy, with per-tool authorization and audit.

Tools · Policy · Audit

Custom retrieval pipelines

Bring your own chunking, hybrid search, reranking. Catalyst doesn't care how you retrieve — only that the runtime can see it.

Ingestion · Custom

Multi-framework fleets

Unify governance across teams using LangChain, LangGraph, raw SDKs, and custom code. One runtime, many implementations.

Platform · Governance

07 — Begin

Built for the part of AI engineering
nobody markets.

Catalyst is early. The contracts are stable. The runtime works. The adapters cover LangChain and LangGraph today; provider SDKs and vector stores are next. If you're tired of rebuilding the same scaffolding on top of every new framework, this is for you.

#docs Read the docs #github #example

Most AI applications aren't
under-engineered.
They're under-governed.

Composition isn't the problem.
Operation is.

A small, opinionated core.
Everything else is an adapter.

Your core logic is yours

The runtime owns governance

Frameworks plug in through adapters

What it looks like, drawn honestly.

One invocation,
every concern accounted for.

What changes, in practice.

Before

After

What people build with it.

Built for the part of AI engineering
nobody markets.

Composition isn't the problem.Operation is.

A small, opinionated core.Everything else is an adapter.

Your core logic is yours

The runtime owns governance

Frameworks plug in through adapters

What it looks like, drawn honestly.

One invocation,every concern accounted for.

What changes, in practice.

Before

After

What people build with it.

Built for the part of AI engineeringnobody markets.

Composition isn't the problem.
Operation is.

A small, opinionated core.
Everything else is an adapter.

One invocation,
every concern accounted for.

Built for the part of AI engineering
nobody markets.