Let me tell you a secret. I have been interacting with the web for a very long time and I am still fascinated by it today. I used to be just a consumer of the web but now I build things for it too and it's even more fascinating.
How does it work? How can a website visit a URL and then display a web page? Since computers understand only ones and zeros, how then is it possible to upload my pictures and graphics on Instagram?
I surely don't need to know the answers to these questions to build web applications but I believe I will better understand what goes on in my applications and be able to debug some issues better if I understand the answers to those questions.
The Web as An Information System
The web is an Information System and there are some needed features of an Information System for it to be useful.
One feature is that resources in the system must be identifiable, the web does this with Uniform Resource Locators (URLs). Another useful feature is that resources should be transferable, this is done on the web via HTTP, the Hypertext Transfer Protocol.
In this post, we will be learning about the history of HTTP briefly and what its evolution so far has made possible on the web.
How Would You Build The Web?
The web has been described as a "subset of the internet". To understand what that means, let first define some terms.
A network is a collection of computers (called nodes) connected together for the purpose of communicating data. If you have different networks in different areas around the world connected together so that data exchange is possible between the different networks you have an "Interconnection of networks" a.k.a Internet.
Since on the web, we access resources hosted on different computers around the world through URLs, the web sits on the internet. This is not the only way to use the internet, that's why the web is referred to as a "subset of the internet".
We need rules
To enable this communication between networks and nodes within a network you need "rules of engagement".
Think of it as hosting a party where you specify a dress code and that attendance is strictly by invitation. These rules for attending the party, usually referred to as Protocols are put in place to control who can enter the party.
On the web, we have a protocol too and it is - you guessed it right - the Hypertext Transfer Protocol. It is the foundation of data communication on the World Wide Web where we have hypertext documents that include hyperlinks to other resources that can be accessed easily (by clicking on the link).
HTTP is the protocol for exchanging resources on the web. HTTP describes what the data being exchanged contains, and how to read them. It still requires another protocol, TCP (in version HTTP/0.9 to HTTP/2) which handles the transmission.
So, HTTP is built on a transport protocol. Whereas HTTP describes what the data in the stream contains, TCP manages the data stream.
Now let's go over the iterations of the HTTP that we have quickly.
HTTP/0.9 and below
Initial versions of HTTP were extremely simple, requests for resources consisted of a single line and the only possible method was
GET followed by the path of the resource.
There was no need to include the full URL of the resource or the server port once the client was connected to the server. This version of the protocol was later dubbed HTTP/0.9 to differentiate it from later versions.
In this version of the protocol, there were no HTTP headers and only HTML files could be transmitted. If an error occurred a special HTML file is returned as there were also no status or error codes.
A request looks like this:
The response consisted of the file itself, like this:
<HTML> <H1>Welcome to my home page </H1> </HTML>
HTTP/1.0 - Extensibility
In version 1.0 of HTTP, some limitations of HTTP/0.9 were addressed which includes:
- Including the version of HTTP within each request
- Including a line for status code at the beginning of each response, allowed the browsers to recognize the failure or success of a request themselves and act accordingly.
- HTTP headers were introduced for both the requests and responses. Additional data known as "Metadata" could also be transmitted, this made the protocol more flexible and extensible
- Other types of documents other than HTML files could be transmitted by setting the
However, HTTP/1.0 still acted as just a request and response protocol. In the sense that, when the browser asks for a resource, it will wait for the response. During this waiting period, the browser cannot do anything else with the connection it has with the server.
Each HTTP request over the TCP connection must be responded to before the next request can be made.
A request and response look like this in HTTP/1.0:
// Request GET /index.html HTTP/1.0 User-Agent: EDGE/2.0 (Windows 8.1) // Response 200 OK Date: Mon, 7 Feb 2022 20:00:31 GMT Server: NINT/3.0 libwww/3.17 Content-Type: text/html <HTML> <H2>Welcome to my portfolio page</H2> <IMG SRC="/background.gif"> </HTML>
Since the response is an HTML page that contains an image, another request will be made for that image file:
// Request GET /background.gif HTTP/1.0 User-Agent: EDGE/2.0 (Windows 8.1) // Response 200 OK Date: Mon, 7 Feb 2022 20:00:31 GMT Server: NINT/3.0 libwww/3.17 Content-Type: text/gif (image content)
HTTP/1.1 included improvements over 1.0 in which one of the most significant is the ability to reuse a connection. In the previous example, a separate connection was made to retrieve the image on the response of the first connection, opening multiple connections is time (and resource) consuming.
Each HTTP request over the TCP connection may be made immediately without waiting for the previous request's response to return. The responses will come back in the same order.
HTTP/1.1 also included some other features:
- Pipelining: This made it possible to send another request while the response of one request hasn't been fully transmitted. This lowered the wait time during communication (compared to HTTP/1.0)
- Responses can be delivered in chunks
- More cache control mechanisms were introduced
- A client and a server can negotiate on the type of content to exchange (Content Negotiation) There are some more features introduced in HTTP/1.1 that you can read more on on the internet.
HTTP/2 included a lot of improvements over HTTP/1.1 which explains why the main version number was increased from 1 to 2. Some of these improvements include:
- It is a binary protocol rather than a text protocol. HTTP/1.x protocols are all text protocols as they represent the messages in text, but HTTP/2 encodes the messages in binary.
- It introduced multiplexing. Each HTTP request over the TCP connection may be made immediately without waiting for the previous response to come back. The responses may come back in any order.
- It compresses headers
- It includes server push technology, which allows servers to push data to clients.
HTTP/3 - Ditching TCP for QUIC
Remember I mentioned that HTTP requires a transport protocol and so far it has been TCP (from HTTP/0.9 til 2).
HTTP/3 is the proposed successor to HTTP/2, it uses another transport protocol named QUIC developed by some guys at Google. The major advantage that HTTP/3 provides is due to this new transport protocol.
The switch to QUIC aims to fix a major problem of HTTP/2 called "head-of-line blocking".
You can read more about this problem here.
In this post, I went through the web and the protocol that drives it. I also explained how the web is a subset of the internet and the different versions of the HTTP protocol.
I have simplified some concepts here so as not to increase the length of this article more than it is now. I will like to write on transport protocols in the nearest future and how they relate to HTTP.
Until then, happy coding.