Testing with LLMs

The TestAgent connects an LLM to your MCP tools, letting you test how models interact with your server. It handles the agentic loop (prompt → tool call → result → response) and gives you rich inspection capabilities.

Why Test with LLMs?

Unit tests verify your tools work correctly in isolation. But in production, an LLM decides which tool to call and what arguments to pass. Testing with real LLMs catches issues like:

Ambiguous tool descriptions that confuse the model
Missing or unclear parameter documentation
Tools that the model consistently misuses
Edge cases in natural language interpretation

Creating a TestAgent

import { MCPClientManager, TestAgent } from "@mcpjam/sdk";

// Connect to your MCP server
const manager = new MCPClientManager({
  myServer: {
    command: "npx",
    args: ["-y", "@modelcontextprotocol/server-everything"],
  },
});
await manager.connectToServer("myServer");

// Create an agent with your tools
const agent = new TestAgent({
  tools: await manager.getTools(),
  model: "anthropic/claude-sonnet-4-20250514",
  apiKey: process.env.ANTHROPIC_API_KEY,
});

Running Prompts

Send natural language prompts and inspect what happens:

const result = await agent.prompt("What is 15 plus 27?");

// What did the model say?
console.log(result.getText());
// "15 plus 27 equals 42."

// What tools did it call?
console.log(result.toolsCalled());
// ["add"]

// What arguments did it pass?
console.log(result.getToolArguments("add"));
// { a: 15, b: 27 }

The PromptResult Object

Every prompt returns a PromptResult with rich inspection methods:

const result = await agent.prompt("...");

// Tool inspection
result.toolsCalled();           // string[] - names of tools called
result.hasToolCall("add");      // boolean - was this tool called?
result.getToolCalls();          // detailed tool call info
result.getToolArguments("add"); // arguments for a specific tool

// Performance
result.e2eLatencyMs();          // total time
result.llmLatencyMs();          // time in LLM API
result.mcpLatencyMs();          // time executing tools
result.totalTokens();           // tokens used

// Error handling
result.hasError();              // did something go wrong?
result.getError();              // error message if so

TestAgent never throws exceptions. Errors are captured in the result, making it safe to run many tests without try/catch blocks.

Multi-Turn Conversations

Pass previous results as context to maintain conversation history:

// First turn
const r1 = await agent.prompt("Create a task called 'Buy groceries'");

// Second turn (model sees the first exchange)
const r2 = await agent.prompt("Mark it as high priority", {
  context: r1,
});

// Third turn (model sees both previous exchanges)
const r3 = await agent.prompt("Now show me all my tasks", {
  context: [r1, r2],
});

This is essential for testing workflows that span multiple interactions.

Tuning Agent Behavior

System Prompt

Guide the model’s behavior:

const agent = new TestAgent({
  tools,
  model: "anthropic/claude-sonnet-4-20250514",
  apiKey: process.env.ANTHROPIC_API_KEY,
  systemPrompt: "You are a task management assistant. Always confirm before deleting tasks.",
});

Temperature

Control randomness (lower = more deterministic):

const agent = new TestAgent({
  // ...
  temperature: 0.1, // More consistent for testing
});

Max Steps

Limit the agentic loop iterations:

const agent = new TestAgent({
  // ...
  maxSteps: 5, // Stop after 5 tool calls
});

Control Multi-Step Loops with stopWhen

Use stopWhen when you want to stop the multi-step loop after a particular step completes:

import { hasToolCall } from "@mcpjam/sdk";

// Stop after the step where the tool is called
const result = await agent.prompt("Search for open tasks", {
  stopWhen: hasToolCall("search_tasks"),
});

expect(result.hasToolCall("search_tasks")).toBe(true);

stopWhen does not skip tool execution. It controls whether the prompt loop continues after the current step completes, and TestAgent also applies stepCountIs(maxSteps) as a safety guard.

Bound Prompt Runtime with timeout

Use timeout when you want to bound how long TestAgent.prompt() can run:

const result = await agent.prompt("Run a long workflow", {
  timeout: { totalMs: 10_000, stepMs: 2_500 },
});

if (result.hasError()) {
  console.error(result.getError());
}

timeout accepts number, totalMs, stepMs, and chunkMs. In practice, number and totalMs cap the full prompt, stepMs caps each step, and chunkMs mainly matters in streaming flows. The runtime creates an internal abort signal, so tools can stop early if their implementation respects the provided abortSignal.

Writing Assertions

Use validators to assert tool call behavior:

import { matchToolCalls, matchToolCallWithArgs } from "@mcpjam/sdk";

const result = await agent.prompt("Add 10 and 5");

// Check the right tool was called
expect(matchToolCalls(["add"], result.toolsCalled())).toBe(true);

// Check the arguments were correct
expect(
  matchToolCallWithArgs("add", { a: 10, b: 5 }, result.getToolCalls())
).toBe(true);

Performance Insights

Understand where time is spent:

const result = await agent.prompt("Run a complex operation");

const latency = result.getLatency();
console.log(`Total: ${latency.e2eMs}ms`);
console.log(`  LLM: ${latency.llmMs}ms (${(latency.llmMs/latency.e2eMs*100).toFixed(0)}%)`);
console.log(`  Tools: ${latency.mcpMs}ms (${(latency.mcpMs/latency.e2eMs*100).toFixed(0)}%)`);

// Token usage for cost estimation
console.log(`Tokens: ${result.totalTokens()}`);

Next Steps

Running Evals

Run statistical evaluations

TestAgent Reference

Full API documentation

PromptResult Reference

All result methods

Validators Reference

Assertion functions

Overview

Inspector Features

SDK

Guides

Troubleshooting

Why Test with LLMs?

Creating a TestAgent

Running Prompts

The PromptResult Object

Multi-Turn Conversations

Tuning Agent Behavior

System Prompt

Temperature

Max Steps

Control Multi-Step Loops with stopWhen

Bound Prompt Runtime with timeout

Writing Assertions

Performance Insights

Next Steps

Running Evals

TestAgent Reference

PromptResult Reference

Validators Reference

Overview

Inspector Features

SDK

Guides

Troubleshooting

​Why Test with LLMs?

​Creating a TestAgent

​Running Prompts

​The PromptResult Object

​Multi-Turn Conversations

​Tuning Agent Behavior

​System Prompt

​Temperature

​Max Steps

​Control Multi-Step Loops with stopWhen

​Bound Prompt Runtime with timeout

​Writing Assertions

​Performance Insights

​Next Steps

Running Evals

TestAgent Reference

PromptResult Reference

Validators Reference

Why Test with LLMs?

Creating a TestAgent

Running Prompts

The PromptResult Object

Multi-Turn Conversations

Tuning Agent Behavior

System Prompt

Temperature

Max Steps

Control Multi-Step Loops with stopWhen

Bound Prompt Runtime with timeout

Writing Assertions

Performance Insights

Next Steps