Wizard · Free
Pick the right LLM tier for your workload, frontier API, mid-tier, or self-hosted, based on data sensitivity, task complexity, volume, latency, fine-tuning needs, and operational appetite. Output: a recommendation with model candidates and explicit tradeoffs.
How it works
We don’t gate the tool behind a form. Take the assessment; share your email at the end if you want a written report.
Data sensitivity, task complexity, workload volume, latency requirement, fine-tuning needs, and operational appetite. Each option weights the three tiers.
Three-tier breakdown: frontier API / mid-tier / self-hosted. The headline recommendation is the highest-scoring tier; the breakdown shows how close the alternatives are.
Each tier has specific model candidates we'd evaluate in real engagements (Claude / GPT / Gemini frontier, Haiku / mini / Flash mid-tier, Llama / Mistral self-hosted). Tradeoffs flag what the tier doesn't solve.
Real model selection lands after running candidates against your actual workload on a curated eval dataset. The wizard gives you the directional starting point; we'll do the calibrated evaluation.
Common questions
Models change every quarter. Tier-level recommendations survive model upgrades; specific model recommendations don't. Within each tier we name candidates we'd evaluate, but the choice between Claude Haiku and GPT mini lands after running them against your real workload, not from a wizard.
Common, and usually the right answer is model tiering. Use mid-tier for routine work, frontier for the hard cases, with explicit routing logic. Most production AI deployments we ship use 2 or 3 tiers in tandem; the wizard's strongest signal is the tier you should anchor on.
No. Self-hosted economics start beating frontier-API pricing around 100M tokens/month and beating mid-tier API pricing around 1B tokens/month: assuming you have an existing GPU platform team. If you don't, the operational cost (engineering, monitoring, on-call) typically erases the per-token savings until volume gets very high.
Most enterprise AI doesn't benefit from fine-tuning. Frontier API + strong RAG covers ~85% of enterprise workloads. Fine-tuning is the right answer for narrow tasks with consistent input/output shape, latency-sensitive deployments where small models with fine-tuning beat frontier latency, or sovereignty cases where self-hosted is mandatory and fine-tuning compensates for smaller base capability.
BAA / DPA chain matters. AWS Bedrock under HIPAA-eligible accounts and Azure OpenAI under BAA are production options for ePHI workloads. For data sovereignty mandates (regulated jurisdictions, classified workloads, air-gapped environments), self-hosted is typically the only path. The wizard accounts for these constraints; bring your specific requirements to a discovery call for verification.
Curated eval datasets (built with your domain experts), faithfulness and refusal scoring, latency and cost-per-task tracking, and equity-aware subgroup evaluation where applicable. We run candidates head-to-head on your real workload before recommending a final choice, not against generic benchmarks. Most engagements end with a model decision that surprised at least one stakeholder.
More tools
15-question scored report across data, infrastructure, talent, governance, and use cases.
Open the toolProject the financial return of a proposed AI use case: labor savings, error reduction, payback period.
Open the toolScore a legacy system for replatform vs rearchitect vs replace. With phased rollout recommendations.
Open the toolTalk to us
Bring your scored report to a 30-minute call. Senior engineer plus department lead. No discovery gauntlet, no junior reps.