Every network is documented. The only question is where: in a maintained set of diagrams and records, or in the head of the one engineer who set it up. The second option works fine until 2am on the day that engineer is on holiday and the core switch dies.
The failure mode of network documentation isn't absence - it's decay. Most teams have a beautiful Visio diagram from 2021 that's now actively misleading. So this guide is half what to document and half how to keep it true, because stale documentation is worse than none: people trust it, act on it, and break things.
Decide what gets documented (and what doesn't)
Document what someone would need at 2am with no access to your memory:
- IP addressing - every subnet, its purpose, gateway, VLAN, and DHCP ranges (Step 1).
- Device inventory - every router, switch, firewall, AP, and appliance: hostname, management IP, model, location, who supports it.
- Topology - how it all connects, physically and logically (Step 2).
- VLANs - ID, name, subnet, where it's trunked.
- Firewall rules and NAT - not a config dump; the intent: "443 from internet → reverse proxy in DMZ, ticket #1234".
- External dependencies - ISP circuits and account numbers, DNS provider, registrar, public IP allocations, support contacts.
- Credentials - in a proper password manager/vault with access control and audit, never in the wiki. The documentation says where credentials live, not what they are.
Explicitly skip anything that's better answered live. Current DHCP leases, interface counters, ARP tables - that's monitoring's job, and copying it into a document just creates instant staleness.
Step 1. Start with IP address management
IPAM is the foundation - nearly every other document references it, and an addressing conflict is one of the most confusing failures to debug without it.
Record, per subnet:
- Network address and prefix (
10.20.30.0/24) - Purpose ("Building A user VLAN", "DMZ", "management")
- VLAN ID, gateway address
- DHCP range vs static range, and who owns static assignments
- Significant static hosts within it
Conventions that pay off later:
- Carve a hierarchy, don't allocate ad hoc. Give each site or function a supernet (
10.10.0.0/16= HQ,10.20.0.0/16= warehouse) and subdivide consistently. The Subnet Calculator does the splitting arithmetic - work out the plan before you assign anything, because renumbering later is miserable. - Make addresses meaningful. If
.1is always the gateway,.2-.9are infrastructure, and.250+is static servers in every subnet, half your documentation is encoded in the scheme itself. - Leave growth room. A /24 that's 90% allocated on day one is a renumbering project on day 400.
Tooling: a disciplined spreadsheet works for small networks. Past a few dozen subnets, use real IPAM software (NetBox is the common open-source choice) - it adds validation, history, and an API, and it can serve as your source of truth for automation. For public-facing addresses, periodically sanity-check what the world sees: IP Geolocation on your egress IPs confirms the ASN and ownership records match what you've documented.
Step 2. Diagram the topology - with conventions
You need two diagrams, and they should stay separate:
- Physical - devices, ports, cables, patch panels, rack positions. Answers "which cable do I trace?"
- Logical - subnets, VLANs, routing, firewall zones. Answers "how does traffic get from A to B?"
One diagram trying to be both ends up failing at both.
Conventions matter more than the drawing tool. Pick these once and apply them everywhere:
- Consistent shapes/icons per device class (router, L3 switch, L2 switch, firewall, server). Stick to one icon set.
- Label every link with the interface on each end and, on the logical diagram, the subnet or VLAN it carries (
Gi0/1 - Gi1/0/24,10.20.30.0/24). - Flow direction: untrusted at the top or left, trusted at the bottom or right - readers should be able to assume it.
- A legend on every page. Diagrams get printed, screenshotted, and pasted into incident channels without context.
- One question per diagram. An overview page, then a page per site/zone/function. The everything-on-one-page mega-diagram is unreadable at exactly the moment someone needs it.
Tool-wise, prefer formats that diff: text-based tooling (Mermaid, Graphviz, draw.io files committed to git) lets topology changes show up in version control like code changes. That becomes important in Step 4.
While diagramming, verify rather than transcribe from memory: confirm each documented hostname resolves where you think with a DNS lookup, and ping management IPs as you record them. Documentation written from memory inherits memory's bugs - and it's surprising how often this step alone finds a forgotten device or a wrong record.
Step 3. Document intent, not just state
Configuration can be re-read from the device; why it's configured that way can't. For firewall rules, routing decisions, and anything non-obvious, record:
- What the rule/route/setting does
- Why it exists (ticket number, project, requirement)
- Who asked for it and when
- When it can be removed, if temporary
"Temporary" rules without an expiry note become permanent. Five years later nobody dares delete them, and the firewall accretes risk. A one-line intent note is the difference between a confident cleanup and a rule that outlives three engineers.
Step 4. Make it a living document
This is where most documentation efforts die, so make it structural rather than aspirational:
- One source of truth. One wiki space, one repo, one IPAM. Copies diverge; link, don't duplicate.
- Change process includes documentation. A network change isn't complete until the docs are updated - put it in the change checklist or PR template, not in people's good intentions.
- Date everything. Every page carries "last verified" and an owner. A reader must be able to judge staleness at a glance.
- Review on a schedule. Quarterly, walk the docs against reality: do the diagrams match? Do documented records still resolve correctly? Are inventory entries still alive? An hour a quarter beats a heroic rewrite every three years.
- Automate verification where you can. Anything externally visible can be watched by machines: a DNS record drift monitor on your key hostnames alerts you when reality diverges from what's documented, which both catches unauthorised changes and tells you a doc update is due. The same philosophy applies internally - config backups (Oxidized, RANCID) diffed nightly turn "someone changed something" into a readable changelog.
- Lower the friction. If updating the docs takes 30 seconds, it happens; if it means finding the one laptop with the diagram tool licence, it doesn't. This is the strongest argument for plain-text formats in git.
Common documentation mistakes
Documenting everything to the same depth. Exhaustive detail everywhere guarantees decay everywhere. Go deep on what's stable and load-bearing (addressing plan, topology, firewall intent), stay shallow on what churns.
Screenshots as documentation. Screenshots of configs can't be searched, diffed, or partially updated. Use text.
Credentials in the wiki. Even "just temporarily". Vault them, and have the wiki point at the vault.
The diagram that's never wrong because it's never specific. Boxes labelled "network" connected to "cloud" tell the 2am responder nothing. Interface labels and subnet annotations are the whole value.
No owner. Documentation that's everyone's job is no one's. Assign pages to people.
TL;DR
- Document for the 2am responder: IPAM, device inventory, topology, VLANs, firewall intent, external dependencies; vault the credentials.
- Build the addressing plan first - use the Subnet Calculator to carve a hierarchy with room to grow.
- Keep physical and logical diagrams separate; enforce icon, labelling, and flow conventions with a legend on every page.
- Record why rules exist, not just what they are.
- Verify while writing - DNS lookups and pings against what you document.
- Keep it alive: single source of truth, docs in the change checklist, dated pages with owners, quarterly review, and drift monitoring for the externally visible parts.
