
Incident Response Runbooks: When Minutes Matter in Deliverability & Compliance
Incident Response Runbooks: When Minutes Matter in Deliverability & Compliance
Deliverability and compliance incidents have a half-life measured in minutes. A sudden spike in Gmail spam complaints, an A2P 10DLC block, or a data leak disclosure timer can turn a normal day into a reputational and revenue crisis. Incident response runbooks turn that panic into choreography: who does what, in what order, with what guardrails and SLAs. Use this field guide to script your response before you need it.
Legal note (US): This article is general guidance for marketing and growth teams. It is not legal advice. Consult counsel for interpretations of laws and carrier/regulator requirements.
Incident types (deliverability, data leak, policy violation)
Deliverability incidents. Typical triggers: Gmail temp-fails, sudden drops in inbox placement, blocks at key ISPs/ESPs, elevated spam complaints, domain or IP reputation degradation, or SMS/MMS carrier filtering. For email, Google’s 2024 sender rules require authentication, easy unsub, and keeping user-reported spam rates below thresholds—or risk rate limiting and blocks. (See Google Sender Guidelines, Google 2024.) Google Help+1
Data leak incidents. Examples: misconfigured form or webhook exposing PII, public URL with exportable contact data, lost device containing customer lists, or a compromised API key enabling unauthorized message sends. These incidents often carry notification timelines and evidence preservation requirements under contracts and regulations.
Policy violation incidents. Examples: unregistered or misconfigured A2P 10DLC campaigns causing carrier blocks and fees; missing opt-in proof for a source segment; missing unsubscribe in email/SMS; sending to suppressed or opted-out contacts; ignoring UTM/postback deduplication rules in partner channels. US SMS requires A2P 10DLC registration and campaign vetting for application traffic, with non-compliance leading to blocking and penalties. (Twilio A2P 10DLC overview.) Twilio
Severity model & triggers.
Sev-1 (Critical): Legal or contract clock running (data exposure or breach); widespread blocks across a top provider; carrier shutdown; public mentions.
Sev-2 (Major): Significant deliverability degradation (e.g., >20% drop in opens, >0.3% spam rate in Gmail Postmaster) or partial carrier filtering.
Sev-3 (Minor): Localized failure (one domain/campaign), automated controls holding, no legal timer.
Action: Add these three categories and severity triggers into your runbook index; link each to preapproved containment steps. Then Download the IR Templates to adapt.
Roles & escalation tree
RACI snapshot. A clear RACI (Responsible, Accountable, Consulted, Informed) prevents thrash:
FunctionRole in incidentRACIMarketing Ops (MOPS)Triage signals; pause automations & campaigns✓RevOpsOwn CRM truth data; enforce suppressions✓✓Security/ITForensics; access control; evidence handling✓✓Legal/PrivacyNotifications; regulator/customer language✓✓ESP/SMS AdminProvider tickets; DNS & sending config✓Comms/PRInternal & external statements✓✓Exec SponsorRisk decisions; tradeoffs✓✓
Escalation tree (timeboxed).
Triage lead (MOPS on-call) acknowledges Sev-1 within 15 minutes; declares incident; opens war-room channel and ticket.
Accountable owner (Security or RevOps, by incident type) joins within 15 minutes, approves containment moves that carry risk (e.g., DNS changes, mass pauses).
Legal/Privacy looped within 30 minutes for any data exposure or policy breach; starts timer evaluation.
Provider escalation (ESP/carrier) initiated within 30–60 minutes with evidence packet (headers, sample message, error codes, Postmaster screenshots, 10DLC campaign IDs).
Executive brief at T+60 minutes with status, impact band, ETA to next milestone.
On-call & redundancy. Maintain a weekly on-call rotation for Triage lead and ESP/SMS admin; store contact preferences and backups in the runbook.
Action: Publish your RACI and a single-page escalation map in your workspace, with phone/SMS fallbacks for after-hours. Then Download the IR Templates to fill your team names and SLAs.
Containment steps per incident
Deliverability containment (email).
Pause blast campaigns to domains showing temp-fails/blocks; keep only transactional messages that meet policy.
Check auth & DNS: verify SPF, DKIM, and DMARC alignment on active sending domains; pivot to a warm fallback subdomain only if alignment passes and the domain is pre-warmed.
Complaint & bounce triage: pull Gmail Postmaster and provider dashboards; suppress recent engagers with complaints or bounces; quarantine segments driving spikes; stop sending to new acquisitions lacking confirmed consent.
Provider ticket: submit concise evidence and remediation plan; include headers, error codes, and before/after changes. Google’s bulk-sender requirements (2024) spell out authentication, one-click unsubscribe, and spam-rate thresholds; cite these in your plan. Google Help
Deliverability containment (SMS/A2P).
Verify 10DLC status: brand, use case, and campaign vetting; confirm opt-in language and sample messages match registration.
Throttle or pause affected campaigns; isolate short code vs. long code traffic.
Content fix: remove SHAFT content, shorten URLs or use branded domains, ensure HELP/STOP flows.
Carrier/CPaaS ticket: include campaign ID, sample payloads, opt-in screenshots, and STOP/HELP logs. See A2P 10DLC compliance guidance for required artifacts. Twilio
Data leak containment.
Isolate and preserve: revoke keys/rotate secrets; restrict offending endpoints; snapshot logs; preserve evidence chain.
Access control: force resets, disable compromised accounts, revoke OAuth/SCIM tokens.
Minimize exposure: remove public links; invalidate exports; purge third-party caches if applicable.
Notification clock: consult Legal/Privacy on contractual and statutory timelines; draft holding statements.
Policy violation containment.
Stop affected sends from violating source/campaign.
Remediate consent gaps: backfill
consent_type, timestamp, source, proof_uri
; repair unsubscribe flows.Re-qualify data: run suppression merges and deduplication; re-permission where needed.
Provider alignment: if an ESP puts you under review, respond with a corrective action plan (segmentation, content, cadence, consent verification). Align with NIST’s lifecycle emphasis on containment, eradication, and recovery to structure steps. NIST Publications
Evidence package (all incidents). A standard zip with: timeline, decision log, screenshots (Postmaster/carrier errors), headers/message samples, DNS/auth records, consent proofs, API logs, and communication copies.
Action: Attach the containment checklists to each incident type in your runbook and pre-write your provider tickets. Then Download the IR Templates to customize.
Communication templates (internal, partners, customers)
Internal (exec + cross-functional).
Subject: Sev-1 Deliverability Incident — Gmail blocks on primary domain
Context: Detection time, impacted channels/domains, revenue-at-risk band.
What we’ve done: Paused campaigns X/Y, checked DKIM/DMARC, opened Google/ESP ticket, suppressed segment Z.
What’s next: Expected milestones (e.g., “Gmail mitigation review in 24–48h”), decision queue (e.g., warm fallback domain if blocks persist).
Asks: Approvals, staffing, legal review.
Partners (agencies, ESP/CPaaS).
Subject: Request for expedited review: A2P filtering on Campaign ID ####
Packet: Campaign ID, brand ID, sample messages, opt-in URL and screenshot, STOP/HELP logs, observed error codes, timestamps, remediation steps taken.
Tone: Factual and brief. Reference relevant policy points to show you’re aligned (e.g., 10DLC registration and content policy). Twilio
Customers (if needed).
Subject: Notice of service disruption affecting email/SMS reachability
Body: What happened (plain language), what data/sending was involved, what you’re doing now, what you need from them (usually nothing), and where updates will be posted. No blame, no speculation.
Regulator/contractual notices (if required). Prepare variants with Legal to satisfy timing and substance requirements; log exactly when they were sent and by whom.
Action: Store these templates in your comms library with owner + approval route fields filled. Then Download the IR Templates to paste ready-made variants into your tools.
Post-incident review & prevention
Structured post-mortem (within 5 business days).
Facts only: timeline, signals, decisions, comms, and infrastructure changes.
Five lenses: consent provenance, data hygiene, content patterns, sending infrastructure, monitoring & alerting.
Metrics: MTTD (detect), MTTC (contain), time under block, messages at risk, affected revenue, spam-rate trajectory, partner SLAs met/missed.
Controls added: e.g., preflight QA for opt-out links, campaign throttles, domain warm-up gates, 10DLC registration checks, automated suppression merges.
Prevention guardrails you can automate.
Consent governance: required fields (
consent_type, timestamp, source, proof_uri
) on ingestion; block sends missing proof.Deliverability SLOs: stop-send rules when spam rate ≥0.3% in Gmail Postmaster or when temp-fails exceed threshold; auto-open provider tickets with evidence. (Guided by Google’s bulk-sender program.) Google Help+1
A2P conformance gates: deployment checks for active brand/campaign IDs and compliant opt-in/STOP language before activation. (See carrier/CPaaS 10DLC requirements.) Twilio
Data hygiene & dedup: nightly suppression merges; partner postback dedup priority (own CRM truth first).
Versioned automation: dev→stage→prod with change logs and rollbacks for journeys, webhooks, and content.
Dashboards: surface error codes, complaint rates, and blocklists; tie to on-call alerts.
SLA examples for speed.
Sev-1 acknowledge: 15 minutes.
Containment decision: 60 minutes to first irreversible change; 4 hours to full containment target.
Provider escalation: within 30–60 minutes, daily follow-ups until resolved.
Internal update cadence: hourly until downgraded; daily summaries thereafter.
Customer status page: first note by T+4 hours if customer-visible.
Maturity roadmap (quarterly).
Quarter 1: establish runbooks, on-call, and dashboards; implement stop-send guardrails.
Quarter 2: backfill consent proofs; register/verify all A2P campaigns; segregate transactional domains and IPs.
Quarter 3: red-team incident drills; add automated evidence pack generation; pre-approve fallback domains and warming playbook.
Quarter 4: contract SLAs with agencies/partners; add tabletop exercises with Legal/PR.
Action: Schedule your first tabletop and assign owners for each prevention control. Then Download the IR Templates to score your current readiness and close gaps.
Why this works: You’re aligning to recognized incident lifecycle best practices—detect, analyze, contain, eradicate, recover, and learn—articulated by NIST’s 2025 update and federal playbooks. Treat your runbooks as products with owners, versions, and tests, and they’ll pay back the first time a red banner hits your inbox. NIST Publications+1
Sources used
NIST (2025); CISA (2024); Google (2024)