Cross-Platform Troubleshooting Runbook

Technical incident response patterns for self-hosted WordPress, Drupal, Odoo, Rails, and Next.js systems

Troubleshooting22 min readLast updated: June 2026

Jump by Symptom

Confirm blast radius: all routes, subset of routes, or single service dependency.
Check infrastructure first: CPU saturation, memory pressure, connection pool exhaustion, queue backlog.
Freeze new deploys and config mutations.
Evaluate rollback condition within 5 minutes based on user-impact KPI.
If rollback is cleaner than patch, roll back and continue diagnosis in staging.

Segment by error class: validation, runtime exception, dependency mismatch, migration drift.
Inspect release artifact diff: packages/modules/plugins/theme/custom code.
Validate data/schema compatibility for write paths.
Reproduce in staging with production-like configuration and seed data.
Choose fast rollback or hotfix path based on data-integrity risk.

Contain first: disable compromised accounts, rotate credentials, isolate suspicious nodes.
Preserve evidence: immutable copies of logs, process snapshots, and request traces.
Identify entry vector: vulnerable dependency, credential reuse, exposed endpoint, or misconfiguration.
Patch and validate in staging before production promotion.
Run post-incident hardening: WAF rules, least privilege, automated advisories, and recovery drills.

Classify impact: availability, data integrity, security, or performance.
Freeze risky changes: stop deployments and configuration churn.
Capture evidence: current logs, metrics snapshots, request traces, and failing URLs.
Define rollback threshold: if business KPI drops below threshold, roll back immediately.
Assign one commander and one comms owner to avoid coordination deadlocks.

Symptom A: Global timeout or 5xx spike. Start with infrastructure saturation (CPU, memory, DB connections, queue backlog).

Symptom B: Admin works, frontend fails. Check cache invalidation, route-level middleware, and CDN edge rules.

Symptom C: Some pages fail after update. Check plugin/module compatibility and template overrides.

Symptom D: Latency drift without errors. Investigate slow query percentiles, N+1 patterns, and lock contention.

Plugin Conflict Isolation: Disable in groups by domain (SEO, caching, security, forms), then binary-search to offending plugin.

Fatal Error Tracing: Enable debug logging and inspect stack traces for hooks executed before failure.

Cache Layer Split Testing: Toggle object cache, page cache, and CDN cache independently to find stale-layer defects.

Database Recovery: Run table checks, verify collation drift, and test point-in-time restoration in staging before production apply.

Module Dependency Breaks: Map dependency graph before uninstall/upgrade and validate config schema changes.

Container Rebuild Errors: Inspect service definitions and stale compiled container cache artifacts.

Config Sync Drift: Compare active config vs exported config and isolate environment-specific overrides.

Render Cache Poisoning: Validate cache contexts/tags/max-age to prevent cross-user cache leaks.

Slow Transactions: Profile ORM-heavy flows and identify computed fields with expensive dependencies.

Queue Backpressure: Segment long-running workers from latency-sensitive jobs and enforce queue priorities.

Customization Regression: Validate inherited views and custom modules against updated base models.

Reporting Delays: Move heavy reports to asynchronous jobs and pre-materialize frequent aggregates.

Rails: Use query logs plus application traces to isolate N+1 and callback amplification; add bulletproof database indexes before scaling workers.

Next.js: Separate server rendering errors from client hydration mismatches; inspect edge/runtime logs and stale tag revalidation behavior.

React: Profile render waterfalls and expensive effects; verify concurrency-safe state transitions in high-interaction pages.

If you are in a live production issue, we can help with triage, rollback, and root-cause correction.