Ops Capacity Toolkit

Decide now: Cut, sequence, or re-staff work to keep reliability guardrails intact.

When to use: Use this when team load is running ahead of resilience.

Operating outcome: Capacity plan that protects reliability and prevents silent overload.

Typical runtime: 60 minutes monthly plus weekly stress signal review.

Artifact you leave with: Capacity plan with guardrail checklist and escalation dashboard.

Bounded operating rules

• Do now in tighter conditions: In Safety Mode, hold net-new hiring unless the role unblocks committed delivery and shift 10-20% sprint capacity to reliability/churn defense.
• Do now in easier conditions: In Risk-On, loosen buffers only where stress indicators stay green for two cycles and reliability coverage remains protected.
• Proceed threshold: Proceed with added scope only when capacity stress stays within bands and delivery risk owners are named weekly.
• Pause if: Stop roadmap expansion when overload persists for two operating cycles or on-call/SLO guardrails degrade.
• Re-open when: Reverse to protective mode when SLO or on-call load breaches guardrail limits; resume planned hiring after two improving weekly briefs.

Posture split: In Safety Mode, protect resilience capacity; in Risk-On, redeploy surplus into growth loops.

Who should run it: Engineering manager, support/on-call owner, product counterpart, and operations/finance partner.

• Compile on-call load, incident trend, and delivery throughput for the last 4-6 weeks.
• List planned work that assumes additional capacity not yet staffed.
• Define non-negotiable reliability guardrails for the next planning window.

Role path

Engineering

Role path

Ops & Risk

Situation path

Roadmap Focus

Situation path

Decision Rights

Situation path

Ops Capacity