Life of a Network Packet

Introduction

You type “translate lorem ipsum” into Google’s search bar and press Enter. Roughly 200 milliseconds later, ten blue links and a translation widget appear on your screen. In that fifth of a second, your keystrokes triggered a cascade of protocols, each handing off to the next like relay runners passing a baton across the planet. A name was resolved, a connection negotiated, encryption established, a request dispatched, routed through a dozen machines, answered, and rendered — all before you could blink twice.

This article traces that journey step by step. We follow one packet through the four layers of the TCP/IP stack — application, transport, network, and link — from your browser to Google’s servers and back again. Along the way we will meet DNS, TCP, TLS, HTTP, IP, and Ethernet, each doing its part to make the web feel instantaneous.

Background Reading

For a broader and more rigorous treatment of the ideas in this article, see Mung Chiang’s Networked Life: 20 Questions and Answers (Cambridge University Press). Chiang’s book frames networking around real-world questions — “How does Google rank pages?”, “Why doesn’t the Internet collapse?” — and is an excellent companion to the layer-by-layer view presented here.

The URL and the Search Query

When you press Enter, the browser’s first job is to construct a URL. Your query “translate lorem ipsum” becomes:

https://www.google.com/search?q=translate+lorem+ipsum

Spaces are encoded as + in query strings (or as %20 in path segments). Any character outside the unreserved set — letters, digits, -, ., _, ~ — must be percent-encoded: a % followed by two hexadecimal digits representing the character’s ASCII byte value.

Definition (URL Encoding)

Percent-encoding replaces a byte value \(v\) with the three-character string %HH, where \(HH\) is the uppercase hexadecimal representation of \(v\). For example, a space (ASCII 0x20) becomes %20. In the application/x-www-form-urlencoded content type used by HTML forms, spaces are instead represented as +.

Before making any network request, the browser consults its HSTS preload list — a hardcoded set of domains (including google.com) that must only be contacted over HTTPS. This prevents downgrade attacks: the browser never even attempts an insecure HTTP connection.

DNS Resolution

The browser now needs to turn www.google.com into an IP address. This is the job of the Domain Name System (DNS).

The lookup follows a cache hierarchy. First, the browser checks its own DNS cache. If that misses, it queries the operating system’s resolver cache (systemd-resolved on Linux, mDNSResponder on macOS). If both miss, the OS forwards the query to a recursive resolver — typically operated by your ISP or a public service like Google Public DNS (8.8.8.8) or Cloudflare (1.1.1.1).

The recursive resolver performs iterative queries up the DNS hierarchy. It starts at one of the 13 logical root name servers (operated by organisations including ICANN, Verisign, and the US Army Research Lab). The root server doesn’t know Google’s IP, but it knows who manages .com: it returns a referral to Verisign’s TLD name servers. The .com TLD server, in turn, refers the resolver to Google’s authoritative name servers (e.g., ns1.google.com). Finally, Google’s authoritative server replies with the actual IP address — an A record (IPv4) or AAAA record (IPv6) — along with a TTL (Time To Live) that tells caches how long to remember the answer.

DNS queries use UDP on port 53 by default. A typical query-and-response fits in a single UDP datagram, avoiding the overhead of a TCP handshake. TCP is used as a fallback when a response exceeds 512 bytes (the original limit, now effectively 4096 bytes with EDNS0) or when zone transfers occur between servers.

Why does DNS use UDP instead of TCP?

DNS queries are small (typically under 512 bytes) and follow a simple request-response pattern. UDP avoids the three-packet TCP handshake, saving a full round-trip of latency on every lookup. Since DNS resolvers handle millions of queries per second, this overhead saving is substantial. TCP is reserved for the minority of cases where responses are large or reliability is critical (zone transfers).

Remark (ICANN and the Root Zone)

The 13 root name servers are logical servers — they are actually hundreds of physical machines distributed globally via anycast routing. ICANN (the Internet Corporation for Assigned Names and Numbers) coordinates the root zone file, but the root servers themselves are operated by 12 independent organisations. This decentralisation is deliberate: no single entity controls the foundation of DNS.

The Three-Way Handshake

With the IP address in hand, the browser opens a TCP connection to Google’s server on port 443 (HTTPS). TCP provides reliable, ordered, bidirectional byte streams — but before any data flows, the two endpoints must synchronise.

The connection begins with the three-way handshake:

  1. SYN: The client sends a segment with the SYN flag set and a random initial sequence number \(x\).
  2. SYN-ACK: The server responds with both SYN and ACK flags set, its own random sequence number \(y\), and an acknowledgment number of \(x + 1\).
  3. ACK: The client sends an ACK with acknowledgment number \(y + 1\). The connection is now established.

This three-step dance serves two purposes. First, it establishes bidirectional sequence numbers so each side can track which bytes have been sent and received. Second, it prevents stale connections — if a delayed SYN from a previous session arrives, the handshake will fail because the sequence numbers won’t match.

Definition (TCP Segment)

A TCP segment header contains: source port (16 bits), destination port (16 bits), sequence number (32 bits), acknowledgment number (32 bits), data offset and flags (SYN, ACK, FIN, RST, etc.), window size (16 bits for flow control), and a checksum (16 bits). The minimum header size is 20 bytes.

Why not just use UDP for everything?

UDP is faster (no handshake, no connection state) but provides no guarantees: packets can arrive out of order, be duplicated, or vanish entirely. For a web page with hundreds of resources that must arrive correctly and in order, TCP’s reliability is essential. The choice between TCP and UDP is a trade-off between latency and correctness — real-time applications like video calls or games often prefer UDP, while data transfer and web browsing rely on TCP.

The handshake costs one full round-trip time (RTT). We can estimate RTT from the speed of light in fibre optic cable (roughly \(2/3\) of \(c\)):

\[\text{RTT} \approx 2 \times \frac{d}{(2/3)c} + t_{\text{proc}}\]

For a packet travelling from Sydney to a Google data centre on the US west coast (\(d \approx 12{,}000\) km):

\[\text{RTT} \approx 2 \times \frac{12{,}000 \text{ km}}{200{,}000 \text{ km/s}} + t_{\text{proc}} = 120 \text{ ms} + t_{\text{proc}}\]

This physical lower bound explains why latency to distant servers can never drop below ~120ms, no matter how fast the hardware gets.

The TLS Handshake

With the TCP connection open, the browser and server negotiate encryption via TLS (Transport Layer Security). Since both sides support TLS 1.3, the handshake completes in a single round trip.

TLS 1.3 in Brief

TLS 1.3 simplified the handshake from two round trips (TLS 1.2) to one:

  1. ClientHello: The client sends supported cipher suites and key shares (Diffie-Hellman public values) in a single message.
  2. ServerHello: The server selects a cipher suite, sends its key share, its certificate, and a finished message — all in one flight.
  3. The client verifies the server’s certificate against trusted root CAs, completes the key exchange, and sends its own finished message.

From this point, all traffic is encrypted with a symmetric cipher (typically AES-256-GCM) using keys derived from the shared Diffie-Hellman secret. The asymmetric cryptography (ECDHE) was only used to establish these symmetric keys — it would be far too slow for bulk data transfer.

The server’s certificate chain links Google’s certificate back to a trusted root CA (Certificate Authority) pre-installed in your browser or OS. This chain of trust lets the browser verify that it is genuinely talking to Google, not an impostor performing a man-in-the-middle attack.

Why does HTTPS matter for a search query?

Even for a “harmless” search, HTTPS protects against three threats:

  1. Eavesdropping — an attacker on your Wi-Fi network cannot see what you searched for.
  2. Tampering — no one can inject malicious content into Google’s response.
  3. Impersonation — the certificate proves you are talking to Google, not a phishing site. Search queries often reveal sensitive information (medical conditions, financial questions, personal interests), making encryption essential even when the content itself is public.

The HTTP Request

With encryption established, the browser sends an HTTP/2 GET request through the TLS tunnel:

Example (A Typical HTTP Request)
GET /search?q=translate+lorem+ipsum HTTP/2
Host: www.google.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ...
Accept: text/html,application/xhtml+xml
Accept-Language: en-AU,en;q=0.9
Accept-Encoding: gzip, br
Cookie: NID=...; SID=...

The Host header identifies the virtual host (Google runs many services on the same IP). Accept-Encoding tells the server the client supports gzip and Brotli compression. Cookies carry session state and preferences from previous visits.

HTTP/2 uses HPACK header compression, which maintains a shared table of previously sent headers on both sides. On a warm connection, these headers compress from ~500 bytes down to ~50 bytes. The protocol also multiplexes multiple requests over a single TCP connection, eliminating the head-of-line blocking that plagued HTTP/1.1.

Packet Encapsulation

Before our HTTP request can travel across the network, it must be wrapped in successive layers of protocol headers — a process called encapsulation. Each layer adds its own header (and sometimes trailer) to the payload it receives from the layer above.

The HTTP message (our search query) becomes the payload of a TCP segment (adding a 20-byte TCP header). The TCP segment becomes the payload of an IP packet (adding a 20-byte IP header). The IP packet becomes the payload of an Ethernet frame (adding a 14-byte header and a 4-byte Frame Check Sequence trailer).

Definition (Maximum Transmission Unit)

The Maximum Transmission Unit (MTU) is the largest IP packet that a link can carry without fragmentation. For Ethernet, the standard MTU is 1500 bytes. Subtracting the IP header (20 bytes) and TCP header (20 bytes), the maximum TCP payload in a single segment is 1460 bytes. If the HTTP request exceeds this, TCP splits it across multiple segments.

The total frame size is:

\[\text{Frame} = \underbrace{14}_{\text{Eth}} + \underbrace{20}_{\text{IP}} + \underbrace{20}_{\text{TCP}} + \underbrace{n}_{\text{payload}} + \underbrace{4}_{\text{FCS}}\]

For our small search request (\(n \approx 50\) bytes after HPACK compression on a warm connection)1, the total Ethernet frame is only ~108 bytes — well within the MTU.

Remark (IP Header Checksum)

The IP header includes a 16-bit checksum computed as the one’s complement of the one’s complement sum of all 16-bit words in the header. Each router along the path must recalculate this checksum because it modifies the TTL field at every hop. This per-hop computation is one reason IPv6 dropped the header checksum entirely, delegating error detection to the link layer and transport layer.

Routing

The encapsulated frame now leaves your machine and begins its journey across the Internet.

Your computer’s network interface card (NIC) transmits the frame to your home switch or access point over Ethernet or Wi-Fi. The switch forwards it to your home router, which performs Network Address Translation (NAT): it rewrites the packet’s source IP from your private address (e.g., 192.168.1.42) to your public IP assigned by your ISP.

The router forwards the packet to your ISP’s network, which routes it through a series of autonomous systems (ASes) — independently operated networks that exchange routing information via the Border Gateway Protocol (BGP). The packet may traverse an Internet Exchange Point (IX) where multiple ISPs peer, then travel undersea fibre cables to reach Google’s nearest edge point of presence.

At each hop, the router: (1) decrements the IP header’s TTL (Time To Live) by 1, (2) recalculates the IP header checksum, and (3) performs a longest prefix match against its routing table to determine the next hop. If the TTL reaches zero, the router drops the packet and sends an ICMP Time Exceeded message back to the sender — this is exactly how traceroute works, sending packets with progressively increasing TTLs to map each hop.

The Postal Analogy

Routing works like the postal system. Your letter (packet) has a destination address (IP). The local post office (your router) doesn’t know the exact house — it just knows to send it to the regional sorting centre (ISP). Each sorting centre reads the address, decides the best next hop, and passes it along. No single post office knows the entire route; each one just knows the next step. BGP is the system by which post offices share information about which regions they can reach.

The fundamental limit on how much data a channel can carry is given by Shannon’s theorem:

Theorem (Shannon’s Channel Capacity)

The maximum rate at which information can be reliably transmitted over a communication channel with bandwidth \(B\) Hz, signal power \(S\), and noise power \(N\) is:

\[C = B \log_2\!\left(1 + \frac{S}{N}\right)\]

This theoretical upper bound, measured in bits per second, cannot be exceeded by any coding scheme — it is a law of physics, not an engineering limitation.

The Response

Google’s server receives our request, executes the search, and sends back an HTTP 200 response containing HTML, CSS, JavaScript, and JSON data for the translation widget. The response is far larger than the request — typically 50–200 KB compressed — and must be split across many TCP segments.

TCP uses a sliding window mechanism for flow control. The receiver advertises a window size telling the sender how many bytes it can buffer. The sender transmits segments up to this window and waits for ACKs before sending more. This prevents a fast sender from overwhelming a slow receiver.

Remark (TCP Windowing)

TCP’s congestion control begins with slow start: the sender starts with a small congestion window (typically 10 segments) and doubles it with each round trip until it detects loss. After a loss event, the algorithm switches to congestion avoidance, growing the window linearly. Modern variants like CUBIC (Linux default) and BBR (used by Google) use more sophisticated models of network capacity to maximise throughput without causing excessive queuing.

Each TCP segment the server sends is individually ACKed by the client (or ACKed cumulatively). If a segment is lost, TCP detects this via duplicate ACKs or a retransmission timeout and resends it. This reliability layer is invisible to HTTP — the application simply sees a clean byte stream.

HTTP/3 and QUIC

The latest evolution of the web transport stack is HTTP/3, which replaces TCP with QUIC — a protocol built on UDP that integrates transport and encryption into a single handshake (0-RTT or 1-RTT). QUIC eliminates TCP’s head-of-line blocking at the transport layer and reduces connection setup time. As of 2026, HTTP/3 handles roughly 30% of web traffic. Our packet’s journey would look quite different under QUIC — no separate TCP or TLS handshake, and loss in one stream doesn’t block others.

The Return Journey

Google’s response traverses the same path in reverse. The packets travel from Google’s edge through the IX, across backbone links, through your ISP, and arrive at your home router. NAT translates the destination address back from your public IP to your private LAN address, and the packets reach your machine.

Your browser’s TLS layer decrypts each segment. TCP reassembles the segments in order (buffering any that arrive out of sequence). The HTTP/2 layer decompresses the headers and demultiplexes the streams. Finally, the rendering engine parses the HTML, fetches subresources (CSS, JS, images — each triggering its own version of this journey), builds the DOM, applies styles, executes JavaScript, and paints the result to your screen.

From Enter to rendered results: 100–300 milliseconds. Billions of transistors, thousands of kilometres of fibre, dozens of protocols, and a century of accumulated engineering — all so you can translate lorem ipsum.

Mung Chiang

The Internet works not because a central authority planned it, but because thousands of independent networks agreed on a few simple rules and let the packets find their way.


  1. Note that \(n\) is not simply the character count of the query string (“translate lorem ipsum” = 21 bytes); it includes all HTTP/2 pseudo-headers (:method, :scheme, :authority, :path), plus headers like Accept, Cookie, and User-Agent, all HPACK-compressed. ↩︎