← Back to Decision Tools toolkit library

Ops Capacity Toolkit

Decide now: Cut, sequence, or re-staff work to keep reliability guardrails intact.

When to use: Use this when team load is running ahead of resilience.

Operating outcome: Capacity plan that protects reliability and prevents silent overload.

Typical runtime: 60 minutes monthly plus weekly stress signal review.

Artifact you leave with: Capacity plan with guardrail checklist and escalation dashboard.

Bounded operating rules

Posture split: In Safety Mode, protect resilience capacity; in Risk-On, redeploy surplus into growth loops.

Who should run it: Engineering manager, support/on-call owner, product counterpart, and operations/finance partner.

Prep checklist

Run sequence

  1. Measure load

    Objective: Quantify demand on people/systems versus sustainable throughput.

    Prompts

    • Which teams are consistently above healthy on-call or delivery load?
    • Which commitments assume best-case capacity?

    Deliverable: Capacity stress test by team with red/yellow/green status.

  2. Protect guardrails

    Objective: Lock service and reliability floors before adding new scope.

    Prompts

    • Which SLOs cannot be traded away this quarter?
    • What planned work should pause when guardrails are breached?

    Deliverable: Service-level guardrail checklist tied to roadmap rules.

  3. Escalate early

    Objective: Create visibility loops before capacity issues become incidents.

    Prompts

    • What weekly indicator predicts stress two sprints ahead?
    • Who receives escalation when overload persists for two cycles?

    Deliverable: Escalation dashboard used in leadership review.

Success signals

Included instruments

Common mistakes to avoid

Canon references

Best starting decision paths