HTTP Protocol¶
Overview¶
HTTP (Hypertext Transfer Protocol) is a protocol for fetching resources such as HTML documents. It is the foundation of any data exchange on the Web and operates as a client-server protocol — requests are initiated by the recipient, usually a web browser.
Developed in the early 1990s, HTTP is a flexible protocol that has continuously evolved. Operating at the application layer (Layer 7 of the OSI model), it typically runs over TCP connections, which may be secured with TLS encryption. Its versatility extends beyond retrieving hypertext documents — HTTP handles images and videos, processes form submissions to servers, and can fetch specific document segments for dynamic webpage updates.
How It Works¶
System Architecture¶
HTTP functions as a client-server protocol where requests originate from a user-agent (typically a web browser) and are sent to servers that process these requests and return responses. Between these endpoints exist various intermediaries called proxies.
The Client Side: The user-agent acts on the user's behalf, with web browsers being the most common example. Browsers initiate all requests in the communication flow. When loading a webpage, a browser first requests the HTML document, then makes subsequent requests for additional resources like scripts, CSS files, images, and videos. These components are assembled to render the complete webpage.
The Server Side: The server responds to client requests by providing the requested documents. While appearing as a single entity, a server may actually be a cluster of machines sharing workload, a collection of software components (caches, databases, e-commerce systems), or multiple server instances on a single physical machine (through the Host header).
Intermediary Proxies: Between browsers and servers, numerous computers relay HTTP messages. Those functioning at the application layer are called proxies, which can be transparent (forwarding without modifications) or non-transparent (altering requests). Proxies serve purposes including caching, content filtering, load balancing, authentication, and request logging.
HTTP Flow¶
When a client wants to communicate with a server, it performs these steps:
- Open a TCP connection: Used to send one or more requests and receive answers. The client may open a new connection, reuse an existing one, or open several connections.
- Send an HTTP message:
GET / HTTP/1.1 Host: developer.mozilla.org Accept-Language: fr - Read the response sent by the server:
HTTP/1.1 200 OK Date: Sat, 09 Oct 2010 14:28:02 GMT Server: Apache Content-Type: text/html <!DOCTYPE html... (the requested web page) - Close or reuse the connection for further requests.
HTTP Messages¶
There are two types of HTTP messages: requests and responses.
Requests consist of:
- An HTTP method (verb like
GET,POST, or noun likeOPTIONS,HEAD) defining the operation - The path of the resource to fetch
- The HTTP protocol version
- Optional headers conveying additional information
- A body for methods like
POST
Responses consist of:
- The HTTP protocol version
- A status code indicating success or failure
- A status message (short description of the status code)
- HTTP headers
- Optionally, a body containing the fetched resource
Status Codes¶
HTTP is stateless — it is up to the client to track request outcomes via response status codes:
200— OK301— Moved Permanently (redirect)401— Unauthorized (client must authenticate)403— Forbidden (authenticated but not authorized)404— Not Found405— Method Not Allowed500— Internal Server Error
HTTP Headers¶
HTTP headers let the client and server pass additional information with a request or response. A header consists of a case-insensitive name followed by a colon, then its value.
Headers can be grouped by context:
- Request headers: Information about the resource to be fetched or about the requesting client
- Response headers: Additional information about the response or the server
- Representation headers: Information about the body (MIME type, encoding)
- Payload headers: Representation-independent info about payload data (content length, encoding)
Important headers include:
Authorization: Basic <credentials>— send basic auth credentials (base64-encoded username:password)Authorization: Bearer <token>— send a bearer token for token-based authenticationAccept: <MIME_type>/<MIME_subtype>— tell the server which data types the client acceptsContent-Type: text/html; charset=UTF-8— media type of the message bodySet-Cookie: name=value— server sends cookies to the client for persistent sessionsHost: example.com— specifies which virtual host to serve (critical for name-based virtual hosting)
Key Terminology¶
- Stateless
- Each HTTP request is independent — the server does not retain information between requests. Session state is managed through cookies, tokens, or other mechanisms.
- Idempotent
- An HTTP method is idempotent if making the same request multiple times has the same effect as making it once.
GET,PUT, andDELETEare idempotent;POSTis not. - User-Agent
- The client software making the request, most commonly a web browser.
- MIME Type
- Media type identifier (e.g.,
text/html,application/json,image/png) used inContent-TypeandAcceptheaders.
Common Ports and Protocols¶
| Port | Protocol | Purpose |
|---|---|---|
| 80 | TCP | HTTP — unencrypted web traffic |
| 443 | TCP | HTTPS — TLS-encrypted web traffic |
| 8080 | TCP | Common alternative HTTP port |
| 8443 | TCP | Common alternative HTTPS port |
Why It Matters¶
As a system administrator, you will:
- Configure web servers to handle HTTP requests and serve content
- Set up virtual hosts to serve multiple websites from one server
- Configure reverse proxies to route traffic to backend applications
- Debug connectivity issues by reading HTTP headers and status codes
- Secure HTTP traffic with TLS (HTTPS) on port 443
- Analyze access and error logs that record HTTP transactions
Common Pitfalls¶
- Forgetting that HTTP is stateless — sessions require explicit mechanisms (cookies, tokens) to persist state between requests.
- Not checking status codes — a 200 response doesn't mean the content is correct; a 404 might indicate a misconfigured
DocumentRoot, not a missing server. - Ignoring the
Hostheader — name-based virtual hosting depends entirely on this header. Without it, the server cannot determine which site to serve. - Caching surprises — browsers and proxies cache aggressively. Use
Ctrl+F5to bypass cache when testing changes. - Mixed content — serving HTTP resources on an HTTPS page triggers browser security warnings.
Further Reading¶
Related Documentation¶
- Technologies: Apache HTTPD
- SOPs: Web Server Management
- Concepts: Virtual Hosting, Reverse Proxy, TLS and Certificates