Skip to main content
The SDK provides APIs to save eval results to MCPJam for visualization in the CI Evals dashboard. Results can be saved automatically via EvalTest/EvalSuite, or manually using the APIs below.

Environment Variables

VariableRequiredDefaultDescription
MCPJAM_API_KEYYes-Your MCPJam workspace API key
MCPJAM_BASE_URLNohttps://sdk.mcpjam.comMCPJam API base URL override
Use MCPJAM_BASE_URL only when you need to override the default ingest host, such as internal development against a non-production backend.

reportEvalResults()

One-shot reporting. Sends all results in a single call. Throws on failure.
import { reportEvalResults } from "@mcpjam/sdk";

Signature

reportEvalResults(input: ReportEvalResultsInput): Promise<ReportEvalResultsOutput>

Example

const output = await reportEvalResults({
  suiteName: "Nightly",
  results: [
    { caseTitle: "healthcheck", passed: true },
    { caseTitle: "tool-selection", passed: true, durationMs: 1200 },
    { caseTitle: "edge-case", passed: false, error: "Wrong tool called" },
  ],
  passCriteria: { minimumPassRate: 90 },
  ci: {
    branch: "main",
    commitSha: "abc123",
  },
});

console.log(`Run ${output.runId}: ${output.result}`);
// "Run abc123: passed"
console.log(`${output.summary.passed}/${output.summary.total} passed`);

reportEvalResultsSafely()

Same as reportEvalResults(), but returns null instead of throwing on failure. Warnings are logged to the console.
import { reportEvalResultsSafely } from "@mcpjam/sdk";

Signature

reportEvalResultsSafely(input: ReportEvalResultsInput): Promise<ReportEvalResultsOutput | null>

Example

const output = await reportEvalResultsSafely({
  suiteName: "Nightly",
  results: [{ caseTitle: "healthcheck", passed: true }],
});

if (output) {
  console.log(`Reported: ${output.summary.passRate * 100}% pass rate`);
} else {
  console.log("Reporting failed (non-blocking)");
}
Use reportEvalResultsSafely() when you don’t want eval reporting failures to break your CI pipeline. Use reportEvalResults() (strict) when reporting is critical.

createEvalRunReporter()

Creates an incremental reporter for long-running processes. Results are buffered and flushed in batches (up to 200 results or 1MB per batch).
import { createEvalRunReporter } from "@mcpjam/sdk";

Signature

createEvalRunReporter(input: CreateEvalRunReporterInput): EvalRunReporter

EvalRunReporter Methods

MethodDescription
add(result)Buffer a result (no network call)
record(result)Buffer a result and auto-flush when buffer is large
flush()Upload all buffered results
finalize()Flush remaining results and finalize the run
getBufferedCount()Number of results in the buffer
getAddedCount()Total results added (including flushed)
setExpectedIterations(count)Set expected iteration count for progress tracking

PromptResult Helpers

MethodDescription
addFromPrompt(promptResult, overrides?)Convert a PromptResult and buffer it
recordFromPrompt(promptResult, overrides?)Convert a PromptResult, buffer it, and auto-flush

EvalTest/EvalSuite Run Helpers

MethodDescription
addFromRun(run, options)Convert all iterations from an EvalTest run
recordFromRun(run, options)Convert and auto-flush from an EvalTest run
addFromSuiteRun(suiteRun, options)Convert all iterations from an EvalSuite run
recordFromSuiteRun(suiteRun, options)Convert and auto-flush from an EvalSuite run

Example

const reporter = createEvalRunReporter({
  suiteName: "Integration Tests",
  passCriteria: { minimumPassRate: 85 },
  ci: {
    branch: process.env.GITHUB_REF_NAME,
    commitSha: process.env.GITHUB_SHA,
  },
});

// Add results as tests complete
await reporter.record({ caseTitle: "test-1", passed: true, durationMs: 500 });
await reporter.record({ caseTitle: "test-2", passed: false, error: "timeout" });
await reporter.record({ caseTitle: "test-3", passed: true });

// Finalize the run
const output = await reporter.finalize();
console.log(`${output.summary.passed}/${output.summary.total} passed`);

Using with PromptResult

const reporter = createEvalRunReporter({ suiteName: "Prompt Tests" });

const result = await agent.prompt("Add 2 and 3");
reporter.addFromPrompt(result, {
  caseTitle: "addition",
  passed: result.hasToolCall("add"),
});

const output = await reporter.finalize();

Using with EvalTest Runs

const reporter = createEvalRunReporter({ suiteName: "Full Suite" });

const test = new EvalTest({
  name: "addition",
  test: async (agent) => (await agent.prompt("Add 2+3")).hasToolCall("add"),
});

const run = await test.run(agent, { iterations: 10 });
await reporter.recordFromRun(run, { casePrefix: "addition" });

const output = await reporter.finalize();

uploadEvalArtifact()

Parses test artifacts (JUnit XML, Jest JSON, Vitest JSON) and reports the results to MCPJam.
import { uploadEvalArtifact } from "@mcpjam/sdk";

Signature

uploadEvalArtifact(input: UploadEvalArtifactInput): Promise<ReportEvalResultsOutput>

Supported Formats

FormatDescription
"junit-xml"JUnit XML test reports
"jest-json"Jest JSON output (--json flag)
"vitest-json"Vitest JSON reporter output
"custom"Custom parser via customParser option

Example

import { readFileSync } from "fs";

// Upload JUnit XML
await uploadEvalArtifact({
  suiteName: "CI Results",
  format: "junit-xml",
  artifact: readFileSync("test-results.xml", "utf-8"),
});

// Upload Jest JSON
await uploadEvalArtifact({
  suiteName: "Jest Results",
  format: "jest-json",
  artifact: readFileSync("jest-results.json", "utf-8"),
});

// Custom parser
await uploadEvalArtifact({
  suiteName: "Custom",
  format: "custom",
  artifact: myData,
  customParser: (data) => [
    { caseTitle: "test-1", passed: true },
    { caseTitle: "test-2", passed: false, error: "failed" },
  ],
});

Types

ReportEvalResultsInput

type ReportEvalResultsInput = MCPJamReportingConfig & {
  suiteName: string;
  results: EvalResultInput[];
};

MCPJamReportingConfig

PropertyTypeRequiredDescription
enabledbooleanNoEnable/disable reporting (default: true)
apiKeystringNoMCPJam API key (falls back to MCPJAM_API_KEY env var)
baseUrlstringNoMCPJam API base URL override (useful for internal development or tests)
suiteNamestringNoSuite name for the run
suiteDescriptionstringNoDescription of the suite
serverNamesstring[]NoMCP server names being tested
notesstringNoFree-form notes
passCriteria{ minimumPassRate: number }NoPass threshold (0-100)
strictbooleanNoThrow on upload errors (false = warn only)
externalRunIdstringNoCustom run ID (auto-generated if omitted)
frameworkstringNoTest framework name (e.g., "jest", "vitest")
ciEvalCiMetadataNoCI/CD pipeline context
expectedIterationsnumberNoExpected total iterations for progress tracking

EvalCiMetadata

PropertyTypeDescription
providerstringCI provider (e.g., "github", "gitlab")
pipelineIdstringPipeline/workflow identifier
jobIdstringJob identifier
runUrlstringURL to the CI run
branchstringGit branch name
commitShastringGit commit SHA

EvalResultInput

PropertyTypeRequiredDescription
caseTitlestringYesTest case title
passedbooleanYesWhether the test passed
querystringNoThe prompt/query sent
durationMsnumberNoTest duration in ms
providerstringNoLLM provider name
modelstringNoModel identifier
expectedToolCallsEvalExpectedToolCall[]NoExpected tool calls
actualToolCallsEvalExpectedToolCall[]NoActual tool calls made
tokens{ input?, output?, total? }NoToken usage
errorstringNoError message
errorDetailsstringNoDetailed error info
traceEvalTraceInputNoConversation trace
externalIterationIdstringNoCustom iteration ID
externalCaseIdstringNoCustom case ID
metadataRecord<string, string | number | boolean>NoCustom metadata
isNegativeTestbooleanNoWhether this is a negative test

ReportEvalResultsOutput

PropertyTypeDescription
suiteIdstringCreated/matched suite ID
runIdstringCreated run ID
status"completed" | "failed"Run status
result"passed" | "failed"Pass/fail based on criteria
summary.totalnumberTotal iterations
summary.passednumberPassed iterations
summary.failednumberFailed iterations
summary.passRatenumberPass rate (0.0 - 1.0)