CDN problems have a special quality: they're different for different people. The user in Sydney sees a stale stylesheet, the user in Frankfurt sees the fixed one, and you, sitting in front of your own machine, see whatever your nearest PoP happens to have cached. "Works for me" is the default state of a CDN incident.
This guide gives you a workflow that cuts through that: read the cache headers first, separate origin problems from edge problems, then deal with the per-PoP and DNS layers that make CDN debugging genuinely hard.
How a CDN failure actually presents
Almost every CDN issue lands in one of four buckets:
- Stale content - the edge keeps serving an old version after you deployed.
- Cache misses - everything is technically working, but every request goes to origin and the site is slow.
- One bad PoP - a single edge location is erroring or serving garbage while the rest of the network is fine.
- DNS pointing at the wrong place - requests never reach the CDN at all, or reach the wrong distribution.
The diagnosis order below resolves all four. Don't skip ahead to purging the cache. A purge that "fixes" the problem without you understanding it guarantees a repeat.
Step 1. Read the cache headers
Open the HTTP Header Checker and request the affected URL. This is the single highest-signal step in CDN debugging, because every major CDN annotates its responses.
Look for:
cache-status- the standardised header (RFC 9211). Values likehit,miss,expired,stale,revalidated, often with the cache's name prefixed, e.g.cache-status: ExampleCDN; hit.x-cache- the older de-facto header. CloudFront sendsx-cache: Hit from cloudfrontorMiss from cloudfront. Fastly and many Varnish-based CDNs sendHIT/MISS, sometimes with multiple comma-separated values when a request traverses shield and edge tiers.cf-cache-status- Cloudflare's variant:HIT,MISS,EXPIRED,BYPASS,DYNAMIC.DYNAMICmeans Cloudflare decided the content isn't cacheable by default - common gotcha for HTML.age- seconds the object has sat in cache. Anageof 86400 on an asset you deployed an hour ago is your stale-content smoking gun.cache-control- what the origin asked for.max-age=31536000on a file you expected to update in place explains everything.via/x-served-by- which edge node handled the request. Fastly'sx-served-byincludes a PoP code (e.g.cache-lhr...for London); CloudFront exposes the PoP inx-amz-cf-pop.
Run the check two or three times. A healthy cached asset shows miss once and hit after; age climbs between requests. An asset that shows miss every time isn't being cached at all - go look at why (cookies, set-cookie on the response, cache-control: private or no-store, query-string variation).
Step 2. Compare edge versus origin
If content is stale or wrong, find out whether the origin itself is wrong before blaming the cache.
Request the same path directly from the origin - via the origin hostname (origin.example.com), the origin's IP with a Host header, or a staging hostname that bypasses the CDN. Compare against the CDN response:
- Origin fresh, edge stale → cache problem. Check
cache-controlandage, then purge (Step 5). - Origin also stale → the CDN is faithfully caching a broken deploy. Fix the deploy; no amount of purging helps.
- Origin errors, edge serves content → the CDN is masking an origin outage with stale-while-error behaviour. The 5xx will surface the moment those objects expire. Treat it as an origin incident.
If you can't reach the origin directly, an HTTP header check against it still tells you a lot: a different server header, no x-cache, no via confirms you've bypassed the edge.
Step 3. Verify the DNS path
The request has to reach the CDN before any of the above matters. Run the hostname through the DNS Lookup tool and check:
- The hostname CNAMEs to the CDN's distribution name (
d1234.cloudfront.net,example.map.fastly.net, etc.), or resolves to A/AAAA records the CDN actually owns. Drop a returned IP into IP Geolocation - the ASN/organisation field will say Cloudflare, Fastly, Amazon, Akamai, or, worryingly, your old hosting provider. - Stale DNS is a classic CDN failure mode: you migrated to a new CDN or new distribution, but a long-TTL record (or a forgotten record at a secondary DNS provider) still points at the old target. Different resolvers age out the old answer at different times, which produces exactly the "broken for some users, fine for others" symptom. Query more than one resolver and compare.
- Check the TTL. If you're mid-migration with a 24-hour TTL, some users will hit the old endpoint for up to 24 hours after the change - there's no purge for other people's resolver caches.
If the records are right today, the question becomes whether they stay right - a DNS record drift monitor alerts you when the CNAME quietly changes, which is how more than one "CDN outage" has turned out to be a DNS edit nobody owned up to. (New to DNS queries? Start with how to perform a DNS lookup.)
Step 4. Isolate a bad PoP
CDNs run on anycast or DNS-based routing: the same hostname lands users on different physical edges depending on where they are. So when one region reports errors, you need to test from that region, or at least identify which edge they're hitting.
- Get an affected user to send you their response headers (browser dev tools → Network → the failing request). The PoP identifier (
x-served-by,x-amz-cf-pop, Cloudflare'scf-raysuffix) tells you which edge served them. - Compare with your own headers for the same URL. Same PoP, different result → likely an inconsistently-purged cache. Different PoP → you may simply never reproduce it locally; trust their headers.
- Run a ping test against the CDN hostname. With anycast, you're measuring your nearest edge, not theirs - useful as a sanity check that the edge near you is healthy and reachable, not as proof the whole network is fine.
- Most CDNs cache per-PoP (sometimes per-server within a PoP). A purge that propagated to 95% of edges and silently failed on one is a real failure mode; re-issue the purge if one PoP keeps serving the old object.
Step 5. Purge - precisely
Once you know it's genuinely a stale edge cache:
- Purge the specific URLs, not everything. A full purge on a busy site stampedes your origin with misses.
- Prefer surrogate keys / cache tags if your CDN supports them - purge "everything tagged
product-123" instead of guessing URL variants. - After purging, re-run the header check. You want one
miss(edge refilled from origin) followed byhits with a smallage.
Then fix the root cause so you stop purging by hand: fingerprinted asset filenames (app.4f3a2c.js) with long max-age, short max-age plus stale-while-revalidate for HTML, and purge-on-deploy in your CI pipeline.
Common CDN debugging mistakes
Trusting your own browser. Your browser cache sits in front of the CDN cache. Test with the header checker or curl, not by hitting refresh.
Forgetting Vary. A vary: accept-encoding, cookie header means the CDN stores multiple variants per URL. You can be looking at a fresh variant while users on a different accept-encoding get a stale one.
Purging before reading headers. If you don't capture the bad response's age, cache-control, and PoP first, you've destroyed the evidence and learned nothing.
Blaming the CDN for origin bugs. Step 2 exists for a reason. Compare against origin before opening a support ticket.
TL;DR
- Check the headers:
cache-status/x-cache,age,cache-control, and the PoP identifier. - Fetch the same path from origin. Stale at origin = deploy problem, not cache problem.
- Verify DNS points at the CDN; watch for stale records mid-migration, and confirm ownership of returned IPs with IP Geolocation.
- For regional reports, get the affected user's headers and identify their PoP - anycast means you can't reproduce their path from your desk.
- Purge specific URLs or cache tags, verify with a fresh header check, then fix cache-control so it doesn't recur.
