Max Rohowsky, Ph.D.

2753 views

How Does the Internet Work?

Ever wondered what happens ‘under the hood’, when you open a website in your browser? Although we open countless websites every day, few of us understand the technical intricacies of what happens.

The description below explains what happens, starting from the press of the enter key on the keyboard. I’m writing this to structure my understanding about the web, and hope it helps you do the same.

This post builds on the posts of Alex Graynor, Ron Mattino, and many others (credited below) who answer this question with rigor and depth.

The Start: A Keyboard Press

Key Down Event

Your finger hits the enter key, pushes it down, and it bottoms out. What’s next? A typical Universal Serial Bus (USB) keyboard is powered by the computer’s USB host controller. Usually, it has contacts arranged in layers with an insulating layer in between (see image below). The key press makes the layers connect, completing an electrical circuit. And this signals to the keyboard controller that the enter key has been pressed.

Keyboard keypress animation
Keyboard keypress animation
Credits: explainthatstuff

Keyboard Circuitry

Most keyboards have a circuit layout designed in the shape of a grid. That’s smart because each key needs to be uniquely identifiable which works well with rows and columns (similar to a chessboard). When the enter key bottoms out, the circuit between a specific row and column is completed. Here’s a simplified illustration of this:

Key matrix animation
Key matrix animation
Credits: pcbheaven

Keycode Conversion

The controller on the keyboard detects the row and column information that corresponds to the enter key and converts it to a keycode integer. This code is stored on the keyboard, ready to be encoded for the USB cable transfer once the endpoint is polled by the host USB controller.

From Keyboard to Computer

USB Cable

The USB cable connects the keyboard to the computer and facilitates the ‘serial’ (i.e. one bit at a time in sequence) data transfer. The diagram below shows the structure of a USB cable.

Dissected view of a USB cable
Dissected view of a USB cable
Credits: Ron Mattino

USB Cable Types

The most common type of USB cables (USB 1.x and 2.0) have four wires: two for power (VBUS and GND) and two for data transfer (D+ and D-). To increase the data transfer speed, USB 3.x cables include additional wires.

Differential Signaling

Inside of the USB cable, data is transferred using differential signaling over the D+ and D- wires. This signaling technique uses two complementary signals to transmit information. One wire carries the signal and the other carries the inverted signal as illustrated below.

USB differential signal
USB differential signal
Credits: All about Circuits

Packet Transfer

The bits of data that sequentially make their way through the USB cable are often referred to as ‘packets’ that follow the low level USB protocol. When the signal reaches the host USB controller it is decoded and interpreted by the keyboard device driver.

Key-Down Message Travels to the Application

Key Press Event

The key press is passed to Microsoft’s Keyboard Human Interface Device KBDHID.sys kernel-mode driver for USB devices, which converts the Human Interface Device (HID) usage (i.e., key event) into a scan code.

Drivers

In our ‘google.com’ example, the press of the enter key originates from a keyboard; therefore, the KBDHID.sys driver interfaces with keyboard class driver KBDCLASS.sys. This class driver, in turn, communicates with the Windows 32-bit Kernel Win32K.sys driver.

Keyboard Human Interface Driver
Keyboard Human Interface Driver
Image from my Device Manager

Active Window

To identify the currently active window, Win32K.sys uses the GetForegroundWindow() API. This ensures that the key press event is addressed to the desired target which is the browser window.

Message Queue

Next, the message dispatcher responsible for managing the message flow between the operating system and open applications calls SendMessage(hWnd, uMsg, wParam, lParam). This adds a message to the queue for a specific window which is processed by the message processing function WindowProc.

ParameterNameDescription
hWndWindow handleUnique identifier assigned to each window
MsgMessageThe message to be sent
wParamWord ParameterAdditional message-specific information
lParamLong ParameterInfo about the key press, including repeat count, scan code, extended key flag, and key context code

Browser’s URL Bar is Parsed and Request is Sent

Parsing the URL Bar

The input in the browser’s URL bar is parsed to check if it’s a URL or search query. In our ‘google.com’ example, the input contains the domain ‘google’ and the top level domain (TLD) ‘.com’. The browser automatically adds the protocol ‘http or https’, prefix ‘www’, and resource ‘/’ (i.e., the index).

To see this, write ‘google.com’ in the URL bar, press enter, and once Google has loaded copy what’s in the URL bar. You’ll see that ‘google.com’ has turned to ‘https://www.google.com/’.

GET Request

To retrieve the Google website form Google’s servers, a GET Request is sent. This is one of the methods used in the HTTP protocol for retrieving data. The next step is to choose between the regular HTTP and encrypted HTTPS protocol.

Choosing the Request Protocol: HTTP or HTTPS?

Checking the HSTS List

How exactly does the browser decide between HTTP and HTTPS? First, the browser looks at its built-in HSTS (HTTP Strict Transport Security) list. If a site is on this list, it receives a HTTPS request. But if it isn’t, the browser uses HTTP in the initial request. If a website is not on the HSTS list but requires HTTPS, it tells the browser to switch to HTTPS in future visits.

Chrome HSTS

Chrome maintains an “HSTS Preload List” (and other browsers maintain lists based on the Chrome list). The listed domains will be preconfigured with HSTS when chrome is first installed. In our ‘google.com’ example, the first request will use the https protocol because (needless to say) Google is on the list.

DNS Lookup: Domain Name to IP Address

Cache Check

Using the domain name, the Domain Name System (DNS) lookup looks-up the Internet Protocol (IP). First, the browser checks the local DNS cache. If not found, a request goes to the Internet Service Provider's (ISP) DNS Server, which also has a cache. If none of these caches have the information, the recursive DNS server steps in.

Recursive DNS Server

The recursive DNS Server receives the request and follows a chain of referrals (hence the name ‘recursive’) until it reaches an authoritative DNS server that can provide the necessary information (i.e., the IP that belongs to the domain).

Top Level Domains (TLD)

Root name servers hold information about top level domains (e.g., .com, .de, .co-uk, etc.). As ‘google.com’ ends with .com, the TLD name server that handles .com domains is queried. This server, in turn, knows the authoritative name server that stores the DNS record for the ‘google.com’ domain and sends the query to it.

DNS Lookup
DNS Lookup
Adapted version of Xiaoli Shen‘s DNS lookup Journey

IP Address Identified

The authoritative name server identifies the IP address for the domain and sends a response containing the IP back. And, along the way, the caches are updated for future requests.

Open a Socket

Sockets for Data Transfer

Using the IP address and the port number (default for HTTP is 80, and HTTPS is 443), the browser makes a call to the socket() function to create a ‘socket’ (i.e. a communication endpoint). There are different types of sockets; however, for the transmission of data over the internet, the SOCK_STREAM is used. This socket type uses the Transmission Control Protocol (TCP) and has the following characteristics:

  • Sequenced: Recipient receives data in the order sent
  • Reliable: There are methods in place to handle packet loss
  • Bidirectional: Both endpoints can send and receive data
  • Connection Mode: Before data is exchanged, a connection is established (in contrast to connectionless mode)
  • Byte Stream: Data is transmitted as stream of bytes

OSI Model

The HTTP GET Request sent by the browser is wrapped (encapsulated) with protocol specific information while passing through the Open Systems Interconnection (OSI) model?This model describes the different layers of abstraction that computers use to communicate over a network layers. (see image below).

High Level to Low Level

The HTTP request originates from the highest OSI model layer, the Application Layer (e.g., the browser). As the request traverses down the OSI layers, it undergoes further processing and encapsulation until it reaches the lowest layer, the Physical Layer, where it is transmitted as binary over the physical network medium.

OSI Model
OSI Model
Credits: Layer X

Server Handles Request

Receiving the Request

The server listens for incoming requests on a specific port, typically port 80 for HTTP and 443 for HTTPS requests. Once a request arrives, it is parsed to extract information, e.g., requested URL, HTTP method, and query parameters.

Handling the GET Request

Servers often run on Apache or nginx for Linux operating systems, and IIS for Windows. There are different types of requests that a server can handle, but by far the most common are:

  • GET: Used to retrieve data from a server.
  • POST: Used to submit data to be processed by a server.

As mentioned further above, the browser sends a GET request to the server to open ‘google.com’. The server, on the other hand, returns a HTTP response containing the relevant HTML, JavaScript, CSS, images, etc.

The Browser

Browser Architecture

To understand how the browser processes the HTTP response from the server, it’s worth taking a look at it’s components:

  • UI: Graphical interface through which users interact with the browser
  • Browser Engine: Middleman between UI and rendering engine
  • Rendering Engine: Interprets HTML, CSS, and other resources to render webpages visually
  • Networking: Handles HTTP requests and responses
  • JavaScript Interpreter: Executes JS code
  • Data Persistence: Handles storage for cookies, cache, etc.
  • UI Backend: Used for drawing basic windows, boxes, buttons in the UI
Components of a Browser
Components of a Browser
Credits: Browser Stack

Document Object Model

The browser begins by parsing the HTML document. It turns the HTML markup into a hierarchical structure known as the Document Object Model (DOM). This parsing process involves identifying HTML tags, attributes, and text content to construct the DOM tree.

Fetch External Resources

While parsing HTML, the browser encounters references to external resources such as CSS stylesheets, JavaScript files, images, fonts, and other files. It initiates separate network requests to fetch these resources from the server.

Cascading Style Sheet Object Model

The browser proceeds to parse CSS stylesheets to create a CSS Object Model (CSSOM). The CSSOM represents the hierarchy of CSS rules and their associated properties, which should be applied to the corresponding HTML elements in the DOM tree.

DOM and CSSOM
DOM and CSSOM
Credits: web.dev

Render Tree

After the DOM tree and CSSOM are fully constructed, the browser combines them to create a unified representation known as the render tree. This tree represents the final layout of the webpage, including the visual formatting of elements, their positions, and styles. The render tree is then used to paint the content onto the screen.

Rendering to UI

Finally, the website is rendered to the display using the Graphical Processing Unit (GPU) or Central Processing Unit (CPU). And google.com shows in your browser window and the curtains close.