Networking Fundamentals for System Design
Before you can design distributed systems, you need to understand how machines talk to each other. IP addresses identify devices, the OSI model describes how data moves through a network, TCP and UDP define how reliably it's delivered, and DNS translates human-readable names into the IP addresses computers actually use. These four concepts underpin every system design decision that follows.
IP Addresses
An IP (Internet Protocol) address is a unique identifier assigned to every device on a network. It serves two purposes: identifying the host and providing its location for routing.
IPv4 vs IPv6
IPv4 uses a 32-bit address written in dotted-decimal notation, giving roughly 4.3 billion unique addresses. Example: 102.22.192.181. The internet ran out of unallocated IPv4 blocks around 2011.
IPv6 was introduced to solve IPv4 exhaustion. It uses a 128-bit hexadecimal address, providing approximately 3.4 × 1038 unique addresses — enough for every grain of sand on Earth to be assigned roughly 45 quintillion (4.5 × 1019) addresses each. Example: 2001:0db8:85a3:0000:0000:8a2e:0370:7334.
| Property | IPv4 | IPv6 |
|---|---|---|
| Address length | 32-bit | 128-bit |
| Notation | Dotted decimal (4 octets) | Hexadecimal (8 groups) |
| Address space | ~4.3 billion | ~3.4 × 1038 |
| Header size | 20–60 bytes | Fixed 40 bytes |
| NAT required | Yes (address exhaustion) | No |
Types of IP Addresses
Public IP — assigned to your network by your ISP. All devices behind a home router share one public IP. This is the address the wider internet sees.
Private IP — assigned within a local network (home or office). Ranges reserved for private use: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. Not routable on the public internet.
Static IP — manually configured; does not change. Used for servers, load balancers, and anything that must be consistently reachable at a known address.
Dynamic IP — assigned by a DHCP server; changes over time. Standard for consumer devices. Cheaper to manage at scale because addresses are recycled when devices leave the network.
When designing services, use private IPs for inter-service communication and only expose public IPs at load balancers or API gateways. This reduces your attack surface and avoids unnecessary egress costs.
OSI Model
The Open Systems Interconnection (OSI) model is a conceptual framework that breaks network communication into seven layers. Each layer has a specific responsibility and communicates with the layers directly above and below it.
Understanding OSI helps you reason about where a problem or feature lives: a firewall operates at layers 3–4, an API gateway at layer 7, TLS at layer 5–6. When someone says “L4 load balancer” or “L7 routing,” they’re referring to OSI layers.
| Layer | Name | Responsibility | Examples |
|---|---|---|---|
| 7 | Application | User-facing protocols and services | HTTP, SMTP, DNS, FTP |
| 6 | Presentation | Data encoding, encryption, compression | TLS/SSL, JPEG, UTF-8 |
| 5 | Session | Managing connections between applications | TLS handshake, RPC sessions |
| 4 | Transport | End-to-end delivery, segmentation, flow control | TCP, UDP |
| 3 | Network | Logical addressing and routing between networks | IP, ICMP, routers |
| 2 | Data Link | Node-to-node framing, MAC addressing, error detection | Ethernet, Wi-Fi, switches |
| 1 | Physical | Transmission of raw bits over a medium | Cables, radio, fibre optics |
How Data Flows Through the Layers
When your browser makes an HTTP request, data travels down the stack on the sender side and up on the receiver side. Each layer wraps the data in its own header (encapsulation), and each layer on the receiving end strips its header (decapsulation).
For example, an HTTP request starts at layer 7, gets encrypted by TLS at layer 6, assigned to a TCP session at layer 5, segmented into TCP segments at layer 4, wrapped in IP packets at layer 3, framed as Ethernet frames at layer 2, and finally transmitted as electrical or optical signals at layer 1.
In system design conversations, layers 3, 4, and 7 come up most often. L3 = IP routing (firewalls, VPNs). L4 = TCP/UDP (NLBs, stateful firewalls). L7 = HTTP/application (API gateways, WAFs, ALBs). Layers 1–2 are typically managed by your cloud provider.
TCP vs UDP
Both TCP and UDP are Layer 4 transport protocols that carry application data across IP networks. They represent a fundamental trade-off: reliability vs speed.
TCP — Transmission Control Protocol
TCP is connection-oriented. Before any data is exchanged, both sides perform a three-way handshake: the client sends SYN, the server responds with SYN-ACK, and the client completes with ACK. Only then does data flow.
TCP guarantees: delivery (lost packets are retransmitted), ordering (segments are reassembled in sequence), and error-checking (checksums on every segment). These guarantees come with overhead — more round trips, more bookkeeping per connection.
UDP — User Datagram Protocol
UDP is connectionless. There is no handshake — the sender just fires packets at the destination. There is no delivery guarantee, no ordering, and no retransmission. What it lacks in reliability, it makes up for in speed and simplicity.
UDP is preferred when low latency matters more than perfect delivery: video streaming (a late frame is worse than a dropped frame), DNS lookups (a small query that fits in one packet), VoIP, and online games.
| Feature | TCP | UDP |
|---|---|---|
| Connection | Connection-oriented (3-way handshake) | Connectionless |
| Delivery guarantee | Yes — retransmits lost packets | No |
| Ordering | In-order delivery | No ordering |
| Error checking | Checksum (corrupt packets discarded + retransmitted) | Checksum only (corrupt packets silently dropped) |
| Speed | Slower (overhead per packet) | Faster (minimal overhead) |
| Broadcasting | No | Yes |
| Use cases | HTTP/S, SMTP, SSH, FTP | DNS, video streaming, VoIP, gaming |
Default to TCP for anything requiring correctness: APIs, databases, file transfers, authentication. Switch to UDP when you control both endpoints, can tolerate loss, and need the lowest possible latency — real-time media, sensor telemetry, or custom game networking protocols.
DNS
DNS is the internet’s phonebook. Humans remember google.com; computers need 142.250.80.46. DNS translates between the two via a distributed, hierarchical system of servers. It’s also a critical part of system design — DNS-level load balancing, health checks, and failover are common patterns.
How DNS Resolution Works
When you type example.com into a browser, a chain of lookups fires before a single byte of the website is fetched:
The Four DNS Server Types
DNS Resolver (recursive resolver) — the first stop for your query. Usually run by your ISP or a public resolver like Google (8.8.8.8) or Cloudflare (1.1.1.1). It does the work of asking other servers on your behalf and caches results.
Root Name Server — knows the address of every TLD server. There are 13 root server types (labelled a–m.root-servers.net), but hundreds of physical machines worldwide via Anycast. They don’t know IPs for individual domains — they just redirect queries to the right TLD server.
TLD Name Server — manages a top-level domain zone (.com, .org, .uk, etc.). It knows which authoritative name server is responsible for each registered domain in its zone.
Authoritative Name Server — the final authority for a specific domain. It holds the actual DNS records (A, CNAME, MX, etc.) and returns the definitive answer. If it doesn’t have a record, it returns NXDOMAIN.
Query Types
Recursive query — the client asks the resolver to do all the work and return a final answer (or an error). Most browser queries are recursive.
Iterative query — the resolver asks each server in turn; each either answers or redirects to another server. The resolver does the legwork.
Non-recursive query — the resolver already has the answer cached and returns it immediately without hitting any upstream servers.
Key DNS Record Types
| Record | Purpose | Example |
|---|---|---|
A | Maps a domain to an IPv4 address | example.com → 93.184.216.34 |
AAAA | Maps a domain to an IPv6 address | example.com → 2606:2800::1 |
CNAME | Alias from one name to another | www.example.com → example.com |
MX | Mail server for a domain | example.com → mail.example.com |
NS | Authoritative name servers for a domain | example.com → ns1.registrar.com |
TXT | Arbitrary text; used for SPF, DKIM, verification | "v=spf1 include:..." |
PTR | Reverse lookup — IP to domain name | 34.216.184.93.in-addr.arpa → example.com |
DNS Caching and TTL
Every DNS record has a TTL (Time to Live) in seconds. Resolvers cache records for the TTL duration. A record with TTL=300 is cached for 5 minutes; after that, the resolver must re-query. Cached results return instantly (non-recursive query).
Low TTL = faster propagation of changes but more DNS queries and higher latency on cold lookups. High TTL = fewer queries but slower rollout of IP changes. For deployments where you need rapid failover (e.g. DNS-based health checks), use a TTL of 60–120 seconds.
DNS is often the first layer of load balancing in large systems. Services like Route 53 and Cloudflare DNS support health checks, weighted routing, geo-routing, and latency-based routing. When you need global load distribution or multi-region failover, DNS is the first tool to reach for — before application-layer load balancers.
Managed DNS Services
- Amazon Route 53 — AWS’s DNS service with health checks, routing policies (weighted, latency, geolocation, failover), and integration with AWS services.
- Cloudflare DNS — fastest public resolver (1.1.1.1), DDoS protection, and proxying built-in.
- Google Cloud DNS — globally distributed, anycast DNS with 100% uptime SLA.
- Azure DNS — hosts DNS zones in Azure with integration into Azure networking and RBAC.