AI product workspace

Turn model tests into product decisions.

Launch with confidence. Launch with aplomb. Run model tests, compare real providers, use full-file prompts, and move from raw outputs to launch decisions, evidence reports, and stronger next tests.

Aplum Decide Aplum Analyst Aplum Innovate

0Batteries

0Models

BYOKProvider billing

Aplum agent workflowProduct mode

Decide

Compare decision frames

See how context and stakeholder lenses change model answers.

Sample use casesPricing change viewed by finance, legal, and support.Policy exception reviewed from customer and compliance angles.Provider choice checked for framing-sensitive advice.

Analyst

Explain the evidence

Convert queued results into reports, charts, and exports.

Sample use casesExplain failed prompts from a release battery.Build charts for model pass rate, cost, and latency.Package full-file results for a product review.

Innovate

Build the next test

Use gaps and goals to generate better batteries or questions.

Sample use casesCreate edge cases for onboarding or support workflows.Turn weak categories into next-release tests.Preview and save a single high-value test question.

Ready for reviewLaunch recommendation, product summary, and next test plan in one workspace.

Designed for product teams

DEC

Make launch calls

Use Decide to compare options, risks, and decision frames before customers are affected.

ANA

Share product summaries

Use Analyst to explain model runs, file handling, and changes in language your team can act on.

INN

Improve the test plan

Use Innovate to turn goals and gaps into sharper tests for the next release cycle.

1Frame the product question

Start with a manual battery, generated items, files, or a spreadsheet upload.

2Run the product test

Send each case through selected models and keep every result tied to the right product question.

3Act on the recommendation

Review quality, variance, latency, cost, and decision stability in one place.

Decide

Launch decision support

Measures trust score, variance, and recommendation flips before a business call depends on a model.

Sample use casesApprove a vendor or policy with stability evidence.Find which decision frames change the answer.Export a decision evidence workbook.

Analyst

Stakeholder-ready reports

Turns queued results, Decide runs, and full-file cases into charts, findings, and downloads.

Sample use casesReport why a model failed a product workflow.Compare models by pass rate, cost, and latency.Build a PDF, deck, spreadsheet, or markdown summary.

Innovate

Test pipeline growth

Reviews batteries and Decide history, then drafts batteries or questions you can accept into the library.

Sample use casesGenerate a battery for a new product workflow.Turn known gaps into targeted regression tests.Draft and save one question into an existing battery.

Select Battery

Pick the prompt battery or file case Aplum AI should grade against.

Select Models

0/10

Use a focused set for quick checks or a wider panel for release decisions.

Run is ready

Aplum AI will run every selected case against every selected model.

Decision Question

Question

Model Under Test

Evaluator Model

Add Frame

Name

Description

Instructions

Recent Runs

Aplum Analyst Search

Search results and drag, swipe, select, or add them to the Analyst queue.

Results

Grade: Model: Search:

Drop a result here for Analyst context • 0 queued

Model	Prompt	Rule	Response	Grade	Explanation	Cost	Latency	Analyst

Chat History

New Chat - Select Models

0 selected

Models with reasoning controls can be tuned per model.

Chat Name

Chat Session

$0.0000

Create Battery

Name

Description

Prompt	Rule	Input Files	Expected Files	Category	ID

Upload Battery

Upload Excel (.xlsx) or CSV with columns:

Required: prompt, rule
Optional: id, category, tags, difficulty, input_files, expected_files

CSV

Drop file here or click to browse

Batteries

API Keys

These provider keys power runs, chat, Decide, Search, Analyst, and Innovate for your account only. Customer accounts use their own keys; Aplum AI does not charge provider usage to the app owner's keys.

Anthropic Not Set

OpenAI Not Set

Google Not Set

Models

Aplum AI checks provider catalogs and hides models that are no longer listed by the provider.

Provider catalog

No refresh run yet.

Auto-check hours

Grading

Grading Model

Instructions

Performance

Max Concurrent

Timeout (s)

Profile

Subscription

Research Use

Research Use is optional at the product level. It lets authorized Aplum AI personnel review prompts, files, responses, failed cases, scores, and metadata to develop measurements such as Interpretive Degrees of Freedom. Only opted-in accounts are included in research exports.

Allow Aplum AI to use this account's runs for internal measurement research.

Users

Research Studies

Only accounts with Research Use enabled are included in participant counts and exports.

Filters

Time Period

Provider

Model

Summary

$0.00

Total Cost

0

Total Tests

0

Total Tokens

$0.00

Avg Cost/Test

Model Signals

Plain-language model profiles from runs and Decide.

Prompt Dimensions

Uses prompt categories and tags as today's dimensions.

Decision Stability

Shows which Decide frames move answers the most.

Next Best Tests

Cost Over Time

Cost by Provider

Cost by Model

Recent Runs

Search Evidence

Analyst Queue

Drag, swipe, or add search results here as Analyst context.

New Analysis

Analyst Model

Analyst uses the selected model's provider API key.

Instructions

Report Builder

Click any chart, metric, bullet, or paragraph in the Analyst output to add it here.

Test the AI moments your product depends on.

Verify your email

Turn model tests into product decisions.

Select Battery

Select Models

Running...

Aplum Decide

Decision Question

Evaluator Instructions

Decide Report

Add Frame

Recent Runs

Frame Admin

Results

Aplum Analyst Search

Results

Multi-Model Chat

Chat History

New Chat - Select Models

Chat Session

Test Item Draft

Batteries

Create Battery

Upload Battery

Batteries

Innovate Recommendations

Innovate Chat

Preview

Recent Batteries

Settings

API Keys

Models

Grading

Aplum Analyst

Aplum Innovate

Agent Knowledge

Performance

Account

Profile

Subscription

Research Use

Admin

Users

Research Studies

Analytics

Filters

Summary

Model Signals

Prompt Dimensions

Decision Stability

Next Best Tests

Cost Over Time

Cost by Provider

Cost by Model

Recent Runs

Aplum Analyst

Search Evidence

Analyst Report

Analyst Queue

New Analysis

Guided Build

Report Builder

Saved Reports

Database

Info

Ask a Question

Export

Raw SQL Query

Schema Reference