This post is also available in Anglais, Français, Arabic
🚨 Cloudflare Incident Report:
A Detailed Look at the November 18, 2025 Outage
Cloudflare experienced a significant network failure on November 18, 2025, beginning around 11:20 UTC, which prevented core network traffic delivery and displayed errors to users accessing Cloudflare-protected sites. The outage was not the result of a cyberattack but rather a complex, internal configuration error.
Affected Server :
- IP TV SERVERS : Affected
- MAILING SBIR SERVER : Non Affected
- ERP API SERVER : Affected
Companies Affected :
- GEMINI
- PREXEBILITY
- CANVA
- DROPBOX
- X(TWITTER)
- Chatgpt(OpenAI)
The Root Cause: A Database Permission Change
The incident was traced back to a change in permissions on one of Cloudflare’s ClickHouse database clusters at 11:05 UTC. This seemingly minor security improvement inadvertently altered the behavior of a critical query used by the Bot Management system.
- Unexpected Duplicates: The revised query, which no longer filtered for the ‘default’ database, began returning duplicate entries for columns, effectively including metadata from an underlying schema (
r0). - Feature File Bloat: This caused the Bot Management system’s essential « feature configuration file » to more than double in size.
- System Panic: This oversized file was propagated across the entire Cloudflare network. The core traffic routing software (our proxy, known as FL2) had a hard-coded size limit (200 features) for this file due to memory preallocation for performance. When the file exceeded this limit, the system hit an unhandled error, resulting in a system panic and the delivery of HTTP 5xx errors.
The Unstable Behavior and Resolution
Initially, the failure was erratic, with the system failing and then recovering every five minutes. This was because the bad configuration file was being regenerated on a rotating basis, depending on which part of the gradually updated database cluster the query hit. This fluctuation led the incident response team to initially suspect a hyper-scale DDoS attack.
Resolution Timeline Highlights:
- 11:20 UTC: Impact begins.
- 13:05 UTC: Mitigation attempts: Workers KV and Cloudflare Access were bypassed to fall back to an older proxy version, reducing impact on those services.
- 14:24 UTC: The team identified the bad configuration file as the source of the errors and successfully stopped its generation and propagation.
- 14:30 UTC: Main impact resolved. A known, good version of the configuration file was manually inserted and deployed globally.
- 17:06 UTC: All systems were fully restored, and 5xx error volumes returned to the normal baseline.
Affected Services
While the core CDN and security services experienced widespread HTTP 5xx errors, other services were also affected:
| Service / Product | Impact Summary |
| Core CDN & Security | Widespread HTTP 5xx status codes. |
| Workers KV | Elevated HTTP 5xx errors until bypass implemented. |
| Cloudflare Access | Widespread authentication failures until bypass implemented. |
| Dashboard/Turnstile | Inability for many users to log in due to Turnstile and Workers KV dependencies. |
| Email Security | Temporary reduction in spam-detection accuracy; no critical customer impact. |
Looking Ahead
Cloudflare acknowledges the severity of this outage—the worst since 2019 for core traffic—and deeply apologizes for the disruption caused to its customers and the wider Internet.
The immediate follow-up steps include:
- Hardening Ingestion: Treating internal configuration files with the same rigorous input validation used for user-generated input.
- Kill Switches: Implementing more global kill switches for features.
- Failure Review: Reviewing failure modes and error conditions across all core proxy modules to prevent unhandled panics.
Cloudflare is committed to building new, more resilient systems to ensure that an outage of this nature does not happen again.
