Key Features
The Test Cases feature provides everything you need to evaluate MCP server reliability:- AI-Powered Test Generation - Automatically generate comprehensive test cases from your tool definitions
- Negative Test Cases - Test edge cases where tools should NOT be triggered
- Multi-Model Evaluation - Run tests across different LLM providers and models
- Accuracy Metrics - View test statistics, see how results change across runs, and compare performance between different LLMs
- Detailed Run Analysis - View test duration, token consumption, and model performance breakdowns
- Batch Operations - Run entire test cases or individual tests with a single click
Getting Started
To start testing your MCP server:- Connect your MCP server - Use the Servers tab to connect to your MCP server
- Navigate to Test Cases - Click the Test Cases tab in MCPJam Inspector
- Create tests - Either:
- Click the plus icon to manually create a test case and configure the scenario, user prompt, expected tools, and expected output
- Click the magic wand icon to auto-generate tests from your tools
- Configure models - Click Results & Runs in the sidebar, then use the Models dropdown to select which models to test against
- Run Tests - Click Run to execute all test cases, or run individual tests from the test case view
When using auto-generate, Claude Haiku creates realistic test scenarios from your tool definitions, including both positive cases (tools that should be called) and negative cases (tools that should NOT be called).
Test Case Structure
Each test case contains:- Scenario - A description of the use case to test
- User Prompt - The exact prompt or interaction to begin the test
- Tool Triggered - Which tools should be called
- Expected Output - The output or experience we should expect to receive back from the MCP server
Positive vs Negative Tests
Positive Tests verify that your tools are correctly triggered when they should be. Examples include:- Single tool usage (“Get me the weather in Tokyo”)
- Multiple tools in one request (“Find flights to Paris and check the weather there”)
- Meta questions about tools (“What parameters does search accept?”)
- Similar keywords without action intent (“I was reading about file systems”)
- Ambiguous or conversational prompts
Managing Test Cases
Use the sidebar to manage your test cases:- Create new tests - Click the plus icon to add a test case manually
- Generate tests with AI - Click the magic wand to auto-generate tests from your tools
- Duplicate tests - Use the dropdown menu on any test to create a copy
- Delete tests - Remove tests you no longer need
Running Tests
You can run tests in two ways:- Run a single test - Choose which of your configured models to use and click Run
- Run all tests - Click Results & Runs, configure your models, then click the Run button
Running tests requires connected MCP servers. If the Run button is disabled, check that your servers are connected in the Servers tab.
Analyzing Results
Results & Runs View
Click Results & Runs in the sidebar to see overall analytics:- Accuracy Donut - Overall accuracy percentage
- Accuracy Chart - Shows pass rates across runs (line connects multiple runs)
- Performance by Model - Bar chart comparing models
Use the dropdown to switch between “Runs” and “Test Cases” views:
Runs view:
- Run History - Shows all your runs with their metrics (Run ID, Start time, Duration, Passed, Failed, Accuracy, Tokens)
- Test Cases Table - List of all tests with Test Case Name, Iterations, Avg Accuracy, Avg Duration
Run Detail View
When you click on a run:- Metrics Summary - Accuracy, Passed, Failed, Total, Duration
- All Iterations Table - List of all test executions with:
- Test name
- Model used
- Tools called
- Tokens consumed
- Duration
- Run Summary Sidebar - Click “View run summary” to see:
- Duration per Test (averaged across models)
- Tokens per Test (averaged across models)
- Performance by Model
Test Case Detail View
When you click on a test case:- Performance Across Runs - Line chart showing how this test performs over time
- Performance by Model - Bar chart comparing pass rates across models
- Iterations List - All executions of this test with:
- Pass/fail status
- Model used
- Tool calls, tokens, duration
- Run ID
- Expanded Details - Click an iteration to see:
- Expected vs Actual tool calls
- Full trace showing the conversation, model reasoning, and tool execution details
Debugging Tests
Use the visual status indicators to quickly identify issues:- Green border - Test passed (expected tools were called)
- Red border - Test failed (tools were not called as expected)
- Yellow border - Test pending
- Gray border - Test cancelled
- Expected vs Actual tool calls (see what’s missing or unexpected)
- Full conversation trace (understand why the model made different decisions)
Best Practices
Writing Effective Test Cases
- Be specific - Include concrete values (dates, IDs, names) in prompts
- Test realistic scenarios - Write prompts as real users would
- Cover edge cases - Include negative tests for boundary conditions
- Review the trace - When tests fail, view the full conversation to understand LLM reasoning
- Iterate before submission - Ensure your test cases pass locally before submitting to app stores

