MCP Evals

Your users are connecting to your MCP server from different clients like Claude Desktop, Cursor, etc, and with different LLMs. MCP evals ensures that your MCP server works across all environments.

E2E testing

We built a CLI that performs MCP evals and End to End (E2E) testing. The CLI creates a simulated end user’s environment and tests popular user flows. An example of E2E test for PayPal MCP:

Connect the PayPal MCP server to testing agent. To simulate Claude Desktop, we can configure the agent to use a Claude model with a default system prompt.
Query the agent to run a typical user query like “Create a refund for order ID 412”
Let the testing agent run the query.
Check the testing agents’ tracing, make sure that it called the tool create_refund and successfully created a refund.

Quick Start

Install

npm install -g @mcpjam/cli

Set up tests

To set up, create a new folder directory. In that directory, we’ll create a test file and an server connection file.

Test file

prompt is what a user would type in the chat to interact with your server.
expectedTools is what tools you’d expect to be called given the prompt
Customize the environment with model and optional advancedConfig

weather-tests.json

{
  "tests": [
    {
      "title": "Test weather tool",
      "prompt": "What's the weather in San Francisco?",
      "expectedTools": ["get_weather"],
      "model": { "id": "claude-3-5-sonnet-20241022", "provider": "anthropic" },
      "selectedServers": ["weather-server"],
      "advancedConfig": {
        "instructions": "You are a helpful weather assistant",
        "temperature": 0.1,
        "maxSteps": 5,
        "toolChoice": "auto"
      }
    }
  ]
}

Server connection file

This file is configured very similar to a mcp.json file. You must provide at least one providerApiKey.

local-dev.json

{
  "mcpServers": {
    "weather-server": {
      "command": "python",
      "args": ["weather_server.py"],
      "env": {
        "WEATHER_API_KEY": "${WEATHER_API_KEY}"
      }
    },
    "api-server": {
      "url": "https://api.example.com/mcp",
      "headers": {
        "Authorization": "Bearer ${API_TOKEN}"
      }
    }
  },
  "providerApiKeys": {
    "anthropic": "${ANTHROPIC_API_KEY}",
    "openai": "${OPENAI_API_KEY}",
    "deepseek": "${DEEPSEEK_API_KEY}"
  }
}

Run MCP Eval

mcpjam evals run --tests weather-tests.json --environment local-dev.json

Short flags

mcpjam evals run -t weather-tests.json -e local-dev.json

CLI Options

--tests, -t <file>: Path to the tests configuration file (required)
--environment, -e <file>: Path to the environment configuration file (required)
--help, -h: Show help information
--version, -V: Display version number

Overview

Features

Troubleshooting

Contributing

Changelog

E2E testing

Quick Start

Install

Set up tests

Test file

Server connection file

Run MCP Eval

Short flags

CLI Options

Overview

Features

Troubleshooting

Contributing

Changelog

​E2E testing

​Quick Start

​Install

​Set up tests

​Test file

​Server connection file

​Run MCP Eval

​Short flags

​CLI Options

E2E testing

Quick Start

Install

Set up tests

Test file

Server connection file

Run MCP Eval

Short flags

CLI Options