Green line-art server stack and heartbeat line cabled to a browser window showing a storm cloud with a lightning bolt, on a dark grid

last verified · 2026-06-10

How to diagnose website downtime

A systematic DNS → TCP → TLS → HTTP method for finding out why a website is down, with the exact dig and curl commands for each layer.

troubleshootinghttpdnsmonitoring

Trace Warrior Team

6 min read

"The site is down" is not a diagnosis. It's a symptom that could mean a DNS misconfiguration, a dead server, an expired certificate, a crashed application, or a problem that exists only on the reporter's network. The fastest way through is to test each layer in the order a browser uses them: DNS, then TCP, then TLS, then HTTP. The first layer that fails is your root cause, and everything below it is irrelevant.

This guide walks the stack bottom-up with the exact commands at each step.

Step 1. Confirm it's actually down

Before debugging anything, rule out "down for just me." Browser caches, corporate proxies, VPNs, and local DNS caches all produce false outages.

Test from a second network (phone on mobile data is the classic move).
Run a check from outside your network with the Ping Test tool, which probes from our infrastructure rather than your machine.
Try the site in a private/incognito window to bypass cached redirects and stale service workers.

If the site loads from elsewhere, the problem is between the reporter and the site (local DNS, proxy, firewall), not the site itself. Different investigation entirely.

Step 2. Check DNS resolution

If the hostname doesn't resolve, nothing downstream matters.

dig +short example.com A

Expected: one or more IP addresses. Possible failures:

Empty output: the record doesn't exist. Check whether the zone was changed recently, or whether the domain expired (run whois example.com and look at the expiry date; expired domains are a surprisingly common cause of "sudden" total outages).
SERVFAIL: the authoritative nameservers are unreachable or the zone is broken. Query the authoritative server directly to confirm: dig @ns1.yourprovider.com example.com A.
Wrong IP: the record resolves, but to an address you don't recognise. Either someone changed it deliberately, or you have a hijack/registrar problem on your hands.

Cross-check against a second resolver to rule out a single resolver serving stale data:

dig +short example.com A @1.1.1.1
dig +short example.com A @8.8.8.8

The DNS Lookup tool does the same checks if you don't have dig handy. If DNS resolves correctly, move up a layer.

Step 3. Check TCP connectivity

DNS gave you an IP. Can you actually open a connection to it on the right port?

nc -vz example.com 443

Or with curl, which reports connect time separately:

curl -sv --connect-timeout 5 -o /dev/null https://example.com

Possible failures:

Connection refused: the host is up but nothing is listening on that port. The web server process crashed or was stopped, or it's bound to the wrong interface. Verify with the Port Checker from outside your network.
Connection timed out: packets are vanishing. Either the host is down, or a firewall is silently dropping traffic. A timeout from everywhere usually means the server or its network is gone; a timeout from one location only suggests a firewall or routing problem on that path.
Connects fine: TCP is healthy. Move up.

Note that ICMP ping is a weaker signal than a TCP connect: many hosts block ICMP while serving traffic normally, and a host can answer ping while its web server is dead. Test the actual port.

Step 4. Check TLS

The connection opens, but does the TLS handshake complete?

echo | openssl s_client -connect example.com:443 -servername example.com 2>/dev/null | openssl x509 -noout -dates -subject

Look for:

notAfter in the past: expired certificate. Browsers hard-fail on this, so to users it looks identical to a full outage. Renew, redeploy, done.
Subject/SAN mismatch: the certificate doesn't cover the hostname being requested. Common after moving a site to a new load balancer or CDN that serves a default cert.
Handshake failure: protocol or cipher mismatch, or a broken certificate chain (missing intermediate). The SSL Certificate Checker validates the full chain and flags missing intermediates, which openssl s_client makes you spot manually.

Expired certs are one of the most preventable outage causes in existence: they fail on a known date. An SSL certificate expiry monitor that alerts 30, 14, and 7 days out removes the entire category.

Step 5. Check HTTP

DNS, TCP, and TLS all pass, so the failure is the application itself. Get the status code and headers without downloading the body:

curl -sI https://example.com

Interpret the status code:

200: the server thinks it's fine. If users still report failures, suspect a CDN or load balancer serving a healthy response from one node while others fail, or a client-side error (check the browser console).
301/302 loop: curl -sIL follows redirects; if it bails after 50 hops, you have a redirect loop, usually a misconfigured HTTP→HTTPS rule or a CDN/origin disagreement about the canonical scheme.
403: access blocked. WAF rule, IP block, or hotlink protection misfiring.
500: application crash. Go read the application logs; the network is not your problem.
502/503/504: the proxy/load balancer is up but the origin behind it isn't responding. 502 means the origin sent garbage or refused; 504 means it timed out. Either way, the investigation moves to the origin servers.

The HTTP Header Checker shows the same status line and headers plus the redirect chain from an external vantage point, which is useful when your own requests are being routed differently (e.g., split-horizon DNS or geo-routing).

Step 6. Rule out the things you don't control

If every layer checks out from some locations but not others:

Hosting provider status page. Check it before you spend an hour debugging an outage that's already on their dashboard.
CDN status. If you're behind Cloudflare, Fastly, or CloudFront, a CDN incident produces exactly this pattern: regional, intermittent, nothing wrong with your origin.
DDoS symptoms. Sudden traffic spikes, connection timeouts under load, and 503s from your own rate limiting. Your provider's traffic graphs will show it.

The order matters

Resist the urge to start at HTTP because that's where the error message appeared. A 502 investigation that doesn't first confirm DNS and TLS can send you log-diving on origin servers when the actual problem is an expired certificate between the CDN and origin. Bottom-up is slightly more typing and dramatically less guessing.

Don't wait for users to tell you

Everything above is reactive: you're diagnosing after someone noticed. A website uptime monitor runs this same layered check on a schedule and alerts you with the failing layer already identified: DNS failure, connect timeout, TLS error, or bad status code. You start at step 5 of this guide instead of step 1, and you start before the first user complaint instead of after.

TL;DR

Confirm it's down from more than one network (Ping Test).
DNS: dig +short example.com A. Empty, SERVFAIL, or wrong IP means stop here.
TCP: nc -vz example.com 443. Refused means dead process, timeout means firewall or dead host.
TLS: openssl s_client. Check expiry dates and the chain (SSL Certificate Checker).
HTTP: curl -sI. Read the status code; 5xx sends you to application or origin logs.
Check provider and CDN status pages before deep-diving.
Set up uptime monitoring so the next outage finds you first.

HTTP Header Checker - status codes and redirect chains from an external vantage point
How to check open ports - the TCP layer in more depth
DNS Lookup tool - the DNS layer without leaving the browser

last verified · 2026-06-10

How to diagnose website downtime

A systematic DNS → TCP → TLS → HTTP method for finding out why a website is down, with the exact dig and curl commands for each layer.

troubleshootinghttpdnsmonitoring

Trace Warrior Team

6 min read

This guide walks the stack bottom-up with the exact commands at each step.

Step 1. Confirm it's actually down

Before debugging anything, rule out "down for just me." Browser caches, corporate proxies, VPNs, and local DNS caches all produce false outages.

Test from a second network (phone on mobile data is the classic move).
Run a check from outside your network with the Ping Test tool, which probes from our infrastructure rather than your machine.
Try the site in a private/incognito window to bypass cached redirects and stale service workers.

If the site loads from elsewhere, the problem is between the reporter and the site (local DNS, proxy, firewall), not the site itself. Different investigation entirely.

Step 2. Check DNS resolution

If the hostname doesn't resolve, nothing downstream matters.

dig +short example.com A

Expected: one or more IP addresses. Possible failures:

Empty output: the record doesn't exist. Check whether the zone was changed recently, or whether the domain expired (run whois example.com and look at the expiry date; expired domains are a surprisingly common cause of "sudden" total outages).
SERVFAIL: the authoritative nameservers are unreachable or the zone is broken. Query the authoritative server directly to confirm: dig @ns1.yourprovider.com example.com A.
Wrong IP: the record resolves, but to an address you don't recognise. Either someone changed it deliberately, or you have a hijack/registrar problem on your hands.

Cross-check against a second resolver to rule out a single resolver serving stale data:

dig +short example.com A @1.1.1.1
dig +short example.com A @8.8.8.8

The DNS Lookup tool does the same checks if you don't have dig handy. If DNS resolves correctly, move up a layer.

Step 3. Check TCP connectivity

DNS gave you an IP. Can you actually open a connection to it on the right port?

nc -vz example.com 443

Or with curl, which reports connect time separately:

curl -sv --connect-timeout 5 -o /dev/null https://example.com

Possible failures:

Connection refused: the host is up but nothing is listening on that port. The web server process crashed or was stopped, or it's bound to the wrong interface. Verify with the Port Checker from outside your network.
Connection timed out: packets are vanishing. Either the host is down, or a firewall is silently dropping traffic. A timeout from everywhere usually means the server or its network is gone; a timeout from one location only suggests a firewall or routing problem on that path.
Connects fine: TCP is healthy. Move up.

Note that ICMP ping is a weaker signal than a TCP connect: many hosts block ICMP while serving traffic normally, and a host can answer ping while its web server is dead. Test the actual port.

Step 4. Check TLS

The connection opens, but does the TLS handshake complete?

echo | openssl s_client -connect example.com:443 -servername example.com 2>/dev/null | openssl x509 -noout -dates -subject

Look for:

notAfter in the past: expired certificate. Browsers hard-fail on this, so to users it looks identical to a full outage. Renew, redeploy, done.
Subject/SAN mismatch: the certificate doesn't cover the hostname being requested. Common after moving a site to a new load balancer or CDN that serves a default cert.
Handshake failure: protocol or cipher mismatch, or a broken certificate chain (missing intermediate). The SSL Certificate Checker validates the full chain and flags missing intermediates, which openssl s_client makes you spot manually.

Expired certs are one of the most preventable outage causes in existence: they fail on a known date. An SSL certificate expiry monitor that alerts 30, 14, and 7 days out removes the entire category.

Step 5. Check HTTP

DNS, TCP, and TLS all pass, so the failure is the application itself. Get the status code and headers without downloading the body:

curl -sI https://example.com

Interpret the status code:

200: the server thinks it's fine. If users still report failures, suspect a CDN or load balancer serving a healthy response from one node while others fail, or a client-side error (check the browser console).
301/302 loop: curl -sIL follows redirects; if it bails after 50 hops, you have a redirect loop, usually a misconfigured HTTP→HTTPS rule or a CDN/origin disagreement about the canonical scheme.
403: access blocked. WAF rule, IP block, or hotlink protection misfiring.
500: application crash. Go read the application logs; the network is not your problem.
502/503/504: the proxy/load balancer is up but the origin behind it isn't responding. 502 means the origin sent garbage or refused; 504 means it timed out. Either way, the investigation moves to the origin servers.

Step 6. Rule out the things you don't control

If every layer checks out from some locations but not others:

Hosting provider status page. Check it before you spend an hour debugging an outage that's already on their dashboard.
CDN status. If you're behind Cloudflare, Fastly, or CloudFront, a CDN incident produces exactly this pattern: regional, intermittent, nothing wrong with your origin.
DDoS symptoms. Sudden traffic spikes, connection timeouts under load, and 503s from your own rate limiting. Your provider's traffic graphs will show it.

The order matters

Don't wait for users to tell you

TL;DR

Confirm it's down from more than one network (Ping Test).
DNS: dig +short example.com A. Empty, SERVFAIL, or wrong IP means stop here.
TCP: nc -vz example.com 443. Refused means dead process, timeout means firewall or dead host.
TLS: openssl s_client. Check expiry dates and the chain (SSL Certificate Checker).
HTTP: curl -sI. Read the status code; 5xx sends you to application or origin logs.
Check provider and CDN status pages before deep-diving.
Set up uptime monitoring so the next outage finds you first.

HTTP Header Checker - status codes and redirect chains from an external vantage point
How to check open ports - the TCP layer in more depth
DNS Lookup tool - the DNS layer without leaving the browser

How to diagnose website downtime

Step 1. Confirm it's actually down

Step 2. Check DNS resolution

Step 3. Check TCP connectivity

Step 4. Check TLS

Step 5. Check HTTP

Step 6. Rule out the things you don't control

The order matters

Don't wait for users to tell you

TL;DR

Related

How to diagnose website downtime

Step 1. Confirm it's actually down

Step 2. Check DNS resolution

Step 3. Check TCP connectivity

Step 4. Check TLS

Step 5. Check HTTP

Step 6. Rule out the things you don't control

The order matters

Don't wait for users to tell you

TL;DR

Related