Define interaction protocols as session types. Monitor your agent's sends, receives, and tool calls at runtime. Catch violations before they cause harm. Based on peer-reviewed research.
!SearchFlights.?FlightResults.!PresentOptions.?UserApproval.!BookFlight.?BookingConfirmation.end
Two case studies applying llmcontract to real agent trajectories. Same monitor, different domains, two surprises.
Replayed Sierra's 2,624 published agent trajectories (4 frontier models, retail + airline) through llmcontract. Caught runs that tau2's outcome eval scored passing while the agent had skipped the explicit-confirmation step the policy requires. Failure mode invisible to outcome-based evaluation.
Generated 90 trajectories driving the Playwright MCP server (3 models × 15 tasks × 2 trials). Failure modes scale opposite directions with model size: Haiku 4.5 snapshots religiously then ignores the snapshot 57% of the time; Opus 4.7 sometimes skips the snapshot but follows through 97% of the time when it does.
Pick a protocol, then click actions to step through it. Try breaking the rules to see violations in real-time.
A concise notation for defining interaction protocols, inspired by session type theory.
| Syntax | Meaning |
|---|---|
| !label | Send action — the agent emits a message or calls an API |
| ?label | Receive action — the agent receives a response or event |
| !{a, b} | Internal choice — the sender selects one of several branches |
| ?{a, b} | External choice — the receiver selects one of several branches |
| A.B | Sequence — A must complete before B begins |
| rec X. ...X... | Recursion — defines a loop; X references the loop point |
| end | Terminal state — protocol is complete |
From raw monitor to full agent integration — use what fits your architecture.
from llmcontract import Monitor, Ok, Violation m = Monitor("!Search.?Results.!Book.?Confirm.end") m.send("Search") # Ok() m.receive("Results") # Ok() m.send("Book") # Ok() m.receive("Confirm") # Ok() assert m.is_terminal m.send("Book") # Violation — protocol already ended
from llmcontract import Monitor, MonitoredClient, LLMResponse monitor = Monitor("rec X.!Request.?{ToolCall.!Result.X, Answer.end}") client = MonitoredClient( llm_call=openai.chat.completions.create, response_adapter=my_adapter, monitor=monitor, send_label="Request", receive_label=lambda r: "ToolCall" if r.has_tool_calls else "Answer", ) response = client.call(model="gpt-4", messages=[...]) # Automatically fires !Request then ?ToolCall or ?Answer
from llmcontract import ToolMiddleware, ProtocolViolationError tools = ToolMiddleware( monitor=monitor, # same monitor as the client tools={"search": search_fn, "book": book_fn}, ) try: results = tools.process(response) except ProtocolViolationError as e: # Tool call violated the protocol — blocked automatically print(f"Violation: {e}")
Protocol strings compile through a clean pipeline into a runtime monitor.
llmcontract applies the formal foundations of session type monitorability to LLM agent interactions.
Christian Bartolo Burlò, Adrian Francalanza, Alceste Scalas.
"On the Monitorability of Session Types, in Theory and in Practice"
35th European Conference on Object-Oriented Programming (ECOOP 2021), pp. 20:1–20:30, Schloss Dagstuhl.