Catalyst is a runtime and contract layer for retrieval, agents, and tool-using workflows. It assumes you'll write your own logic — and gives you tracing, evaluation, versioning, and policy around it, regardless of which framework sits underneath.
LangChain helps you wire components together. LangGraph helps you orchestrate them. Neither was designed to answer the question every enterprise eventually asks: how do we run this safely, observe it honestly, and change it without breaking everything? — The Catalyst position
Every team building AI applications eventually invents the same scaffolding: prompt versioning, trace IDs, evaluation harnesses, PII redaction, cost accounting, a way to know which version of which prompt produced which answer for which user.
It gets built badly, three times per company, scattered across notebooks and side projects, glued onto whichever framework was fashionable that quarter.
Catalyst is the answer to that — built once, properly, as a runtime contract rather than a wrapper.
Catalyst is built on hexagonal architecture. Your business logic stays in plain Python. Frameworks live at the edges. The runtime sits in the middle and enforces consistency.
Custom chunkers, retrieval strategies, planner logic, tool routing, decision rules — none of this should depend on a framework's base classes. Catalyst gives you stable contracts (Retriever, QueryFlow, AgentFlow) so your code outlives any specific dependency.
Execution context, policy enforcement, telemetry emission, prompt resolution, evaluation hooks — these are runtime concerns, not framework concerns. Centralizing them means one place to fix bugs, one schema to query, one model your security team can review.
LangChain middleware, LangGraph runtime context, raw provider SDKs — all reachable through thin translators that satisfy Catalyst's contracts. Swap a framework without rewriting your application. Run two frameworks side-by-side without duplicating governance.
A knowledge agent answering a policy question. The LangChain chain inside is unchanged. Catalyst wraps the invocation with everything you'd otherwise build by hand.
from catalyst import Runtime, ExecutionContext, PromptRef from catalyst.policy import EnterprisePolicy from app.agents import knowledge_agent # plain LangChain inside runtime = Runtime( telemetry=otel_sink, policy=EnterprisePolicy(pii=True, injection=True), evaluator=production_evals, ) ctx = ExecutionContext( app_name="support-assistant", tenant_id="acme", user_id="u-1029", ) result = runtime.run( flow=knowledge_agent, input={"question": "What is the carry-forward policy?"}, prompt=PromptRef("hr_rag", version="2.3.1"), ctx=ctx, ) result.output # the grounded answer result.evals # {groundedness: 0.94, pii_leakage: 0.0, ...} result.trace.id # auditable, queryable, attributable
Listing 1 — A single runtime.run() handles prompt resolution, policy checks, span emission, and post-hoc evaluation.
Prompts in code, version controlled by git blame.
Logging bolted on per-app, every team writes its own format.
PII redaction added after the first incident.
Evaluation is a Jupyter notebook someone ran last quarter.
Swapping LangChain for raw SDKs means rewriting the app.
Prompts are registered artifacts. Versions, owners, diffs.
One telemetry schema. Query it once, answer everywhere.
Policy is declared at the runtime; enforced on every call.
Evaluation runs on every invocation. Trends are visible.
Frameworks become adapters. Swap them; your app doesn't notice.
Catalyst is early. The contracts are stable. The runtime works. The adapters cover LangChain and LangGraph today; provider SDKs and vector stores are next. If you're tired of rebuilding the same scaffolding on top of every new framework, this is for you.