Ops Capacity Toolkit
Decide now: Cut, sequence, or re-staff work to keep reliability guardrails intact.
When to use: Use this when team load is running ahead of resilience.
Operating outcome: Capacity plan that protects reliability and prevents silent overload.
Typical runtime: 60 minutes monthly plus weekly stress signal review.
Artifact you leave with: Capacity plan with guardrail checklist and escalation dashboard.
Bounded operating rules
- • Do now in tighter conditions: In Safety Mode, hold net-new hiring unless the role unblocks committed delivery and shift 10-20% sprint capacity to reliability/churn defense.
- • Do now in easier conditions: In Risk-On, loosen buffers only where stress indicators stay green for two cycles and reliability coverage remains protected.
- • Proceed threshold: Proceed with added scope only when capacity stress stays within bands and delivery risk owners are named weekly.
- • Pause if: Stop roadmap expansion when overload persists for two operating cycles or on-call/SLO guardrails degrade.
- • Re-open when: Reverse to protective mode when SLO or on-call load breaches guardrail limits; resume planned hiring after two improving weekly briefs.
Posture split: In Safety Mode, protect resilience capacity; in Risk-On, redeploy surplus into growth loops.
Who should run it: Engineering manager, support/on-call owner, product counterpart, and operations/finance partner.
Prep checklist
- • Compile on-call load, incident trend, and delivery throughput for the last 4-6 weeks.
- • List planned work that assumes additional capacity not yet staffed.
- • Define non-negotiable reliability guardrails for the next planning window.
Run sequence
Measure load
Objective: Quantify demand on people/systems versus sustainable throughput.
Prompts
- • Which teams are consistently above healthy on-call or delivery load?
- • Which commitments assume best-case capacity?
Deliverable: Capacity stress test by team with red/yellow/green status.
Protect guardrails
Objective: Lock service and reliability floors before adding new scope.
Prompts
- • Which SLOs cannot be traded away this quarter?
- • What planned work should pause when guardrails are breached?
Deliverable: Service-level guardrail checklist tied to roadmap rules.
Escalate early
Objective: Create visibility loops before capacity issues become incidents.
Prompts
- • What weekly indicator predicts stress two sprints ahead?
- • Who receives escalation when overload persists for two cycles?
Deliverable: Escalation dashboard used in leadership review.
Success signals
- • At least one overload source is removed or re-sequenced each cycle.
- • Guardrail breaches trigger automatic roadmap triage within the same week.
- • Leadership can see forward-looking capacity risk two sprints out.
Included instruments
Common mistakes to avoid
- • Assuming normal capacity under stress