Select a model configured inside Maximo AI, then decide whether the run is a realtime smoke test, a last-24-hour health check, or a longer leaderboard sample.
Run
Scenario tracks and acceptance rules for real MyTabulon agent evaluation.
Run RPB
Evaluate a model where business-agent behavior actually happens.
RPB runs inside the MyTabulon agent environment so model quality is measured through tool outcomes, safety, latency, usage, and verified task completion rather than a static worksheet alone.
Exercise CRM, accounting, operations, documents, inventory, memory, and integrations. Each scenario should include realistic context, ambiguity, and permission boundaries.
The benchmark service reads assistant turns, tool events, action-ledger outcomes, feedback, usage, latency, and safe trace metadata.
Use the leaderboard, model page, data export, and methodology page to review score, confidence, domain evidence, and sanitized traces.
Run batches across multiple work surfaces so a model cannot overfit one task type.
CRMRequired trackLead qualification, client updates, deal movement, duplicate avoidance.AccountingRequired trackDraft invoices, payments, expenses, totals, currencies, and approval boundaries.OperationsRequired trackTasks, projects, appointments, approvals, notes, and rescheduling.DocumentsRequired trackFiles, PDFs, AI documents, extraction, generated assets, and redaction.IntegrationsRequired trackGoogle Workspace, MCP, Zapier, WhatsApp, Telegram, Mono, and external tools.MemoryRequired trackPreference storage, retrieval, retention, conflict handling, and secret avoidance.Runs are useful only when evidence is safe, realistic, and auditable.