The internet was not supposed to connect billions of people around the world.
It all started back in 1969 with ARPANET, a network linking four research university research centers in California and Utah. The network was funded by the US government’s Defense Advanced Research Projects Agency, DARPA, at a time where the US military was heavily focused on the cold war with the Soviet Union. The original goal of the research project was to build network communication systems that could continue to operate during a nuclear war.
While it may have started with a military focus, it didn’t take long for the academics who were using ARPANET to start making changes to the system that would better suit the wider scientific and research communities. More computers were added to expand ARPANET during the 1970s, and in 1972 the first email program was created, quickly becoming the most popular application on the network.
In the late 1980’s the first commercial ISPs (Internet Service Providers) were able to connect to the networks that had evolved from the ARPANET project. But the experience was still extremely basic, with users limited to tools like email, FTP (file transfer protocol, used to send and receive files) or USENET (an early group discussion system), that were functional but difficult for non-experts to set up and use.
Everything changed in 1990 when Tim Berners-Lee, an English scientist working at CERN, created the World Wide Web (WWW). The world wide web is a hypertext document system, a technical way of describing a collection of documents that are connected together using clickable hyperlinks (the links you click on a website). You can think of the web as sitting on top of the internet, using the network as “plumbing pipes” to send data for the documents, or web pages, across the world.
Internet history: Web Browsers
When Tim Berners-Lee invented the web he was expecting it to be used by fellow researchers and academics, so while it was a huge step up from earlier internet tools it was still not designed for use by the broader public. The first web browsers (the tool we use to navigate websites) were extremely basic by today’s standards– as an example, it wasn’t unusual to view the text of a webpage in a web browser and then open a separate application to view any associated images.
In 1993 Marc Andresson was a part-time student programmer at the National Center for Supercomputing Applications (NCSA) at the University of Illinois, and an early user of the web. He recognised it’s huge potential but also knew that mainstream growth would require better, and more simple to use web browsers than what was currently available. He set out to work with some fellow students and created the Mosaic browser. While the first browsers were created for researchers and academics, Mosaic was easy for non-experts to use and offered built-in support for graphics and images. It proved to be incredibly popular from launch and quickly became the most popular browser on the early web.
As Mosaic grew in popularity Marc started to come into conflict with members of the university faculty over the future direction of the project. He ultimately decided to leave Mosaic and make the move to Silicon Valley, where he joined up with Jim Clark to found Netscape. Netscape would go on to launch a browser called Netscape Navigator that built upon the early success and appeal of Mosaic, quickly overtaking Mosaic to become the market leading browser. Only 16 months after it was founded, Netscape had the first successful IPO of the “dot-com boom” era, making the founders and early employees instant millionaires.
The huge and sudden growth of the internet caught the attention of Bill Gates and the team at Microsoft, and they decided to enter the web browser wars with their own browser, Internet Explorer. They were late to the race, starting after Netscape, but they had a massive (and arguably unfair) advantage. Microsoft could offer Internet Explorer for free as part of their Window’s operating system, making it the default browser choice for the millions of Windows users who were starting to use the web for the first time. In 1996 Netscape had 86% of the browser market, but by 1999 it had swung to Microsoft who had reached over 75% market share for Internet Explorer.
Netscape eventually admitted defeat and decided to open source their browser, creating Mozilla. Mozilla which went on to release the Firefox series of browsers that were popular in the early 2000’s and are still being released today.
Apple entered the browser market in 2003 with Safari, and while it is extremely popular with users on Apple’s Mac OS and iOS operating systems it has never been able to grab significant share beyond those markets.
In 2008 Google released the Chrome browser, which was based on open source components from Firefox and Safari. Chrome proved to be extremely popular with users, many choosing to leave Internet Explorer and Firefox. Microsoft tried to counter the loss of market share by killing Internet Explorer and replacing it with a new browser called Edge, but have had limited success. Chrome has gone on to become the dominant browser globally, leadinging market share across PCs, smartphones and tablets.
Internet history: Web Search Engines
It’s hard to imagine now, but in the very early days of the web it was technically possible to view every website in existence. But as the web exploded in popularity and thousands, and then millions of new web sites were created people quickly realized that they needed a way to search for information. With the demand growing it was only natural that solutions would soon follow, and they arrived in the form of the first search engines.
The first “all text” based search engines were WebCrawler and Lycos, both released in 1994. These search engines “crawled” the web, taking snapshots of all of the text on each website to build an index that could be used for search queries.
At around the same time, the founders of Yahoo were building a directory that contained websites that they liked, split out into individual categories. This “hand picked” selection proved to be very popular with early web users, and when Yahoo added search functionality to the directory they quickly became one of the most highly-trafficked sites on the entire web.
But these early search engines had problems. The “all text” search engines often provided poor results. They were returning queries based on the text contained on a web page, so it was quite easy for people to “keyword stuff” the pages, adding irrelevant words to the page to try and trick the search engine and gain more traffic. And while a curated search directory like Yahoo may have worked in the earlier days of the web, they simply couldn’t scale up with the growing demands of users as the web expanded massively in size and scope.
This is where Google entered the picture. Founded in late 1998 by Sergey Brin and Larry Page, Google’s search engine used a new algorithm called Page Rank (PR) to provide users with significantly better search results. While most search engines were simply returning results based on the text on a web page, Google was also taking into account how many pages were linking to each web page, and giving each page a “rank” based on the perceived quality of these links. A page that had a few links from highly ranked pages would be considered higher quality than a site with thousands of links from low quality sources, so it would display higher up on the search results page. The improved algorithm removed many of the spam problems and gave users much more accurate results, and users flocked to use Google. Google quickly came to dominate the search engine market, a position they continue to hold on to today.
Web 1.0 and Web 2.0
The early days of internet growth are now commonly referred to as the Web 1.0 period. Web 1 websites were generally static or read-only, a computing term that means that you can view information but you can’t create or edit it. An example of a popular category of web 1.0 websites were the many “web portals”. These sites were “one-stop shops” that contained information on a wide range of topics, including news and entertainment. While they had content for users to consume, that’s all they could do – consume.
The early 2000s brought in the next revolution in the web with the start of the Web 2.0 period. With web 2.0 we moved from read-only to read-write, as new technologies allowed websites to become interactive and users to start creating their own content. This led to a huge explosion of growth in the social web and user-generated content (UGC) products that have come to dominate Web 2.0, starting with apps like MySpace and moving on to the likes of Facebook, Youtube, Instagram and TikTok.
Web 3.0: Blockchains, Crypto
Web3 is the name given to the “next generation” or iteration of the internet. It’s commonly associated with blockchain technology, decentralization and token economics.
A blockchain is a type of distributed, immutable ledger. We can think of it as a way of storing records of data in a way that prevents the record from being changed after it has been created (immutable in computing terms). Copies of the records are then decentralized, or shared across a wide number of computer systems to help prove their accuracy – if someone managed to change a record on one system, the other distributed versions of the record will still show the details of the original record. Blockchains are most commonly used in cryptocurrencies like Bitcoin or Etherum, decentralized finance (DeFi) that aims to remove intermediaries and brokers from the financial markets, and Non-Fungible Tokens (NFT), non-interchangeable units of data that are stored on a blockchain and can be traded,like the Bored Ape Yacht Club, where people pay are paing $330k for a cartoon image of an ape.
Are blockchains, decentralization and tokens really the future of the internet? I don’t know. But many smart people certainly seem to think so, and the leading venture capital firms have bet on it by raising multi-billion dollar investment funds to invest in web3 businesses.
How the Internet works
Now that we know what the internet is, we can take a look under the hood to see how it works.
At its most basic, the internet is an interconnected network of networks. In other words, it is millions of computing devices speaking to each other and sharing information. Now, a virtual Linux server running in an Amazon Web Services (AWS) data center is clearly quite different to the Ring video door camera that you could have installed in your front door, but they may need to speak to each other. How would they do this?
Protocols are standardized sets of rules for data that allow devices to communicate with each other. TCP/IP (Transmission Control Protocol / Internet Protocol) is the name of the suite of protocols that outline how devices should communicate on the internet – in the case of our example, that would be the server and the door camera. Another very important protocol is HTTP, or HyperText Transfer Protocol, which outlines the rules for the web, defining how to load web pages containing hypertext links.
We have millions of devices connected together forming networks of networks, and they are speaking to each other using protocols like TCP/IP. But how can we find an individual device within this virtual haystack? With an Internet Protocol Address, or IP address. Every device on the internet has a unique IP address, that looks something like 18.104.22.168.
Web addresses: URL, DNS, Domain Names
When we open up our web browser and type in google.com we are asking a server located somewhere in the world to send us back information. We know that each device on the public facing internet (including servers) has a unique IP address but it would be impossible to try and remember the IP addresses of all of the different servers we want to access as we browse the web. Thankfully we have the Domain Name System, or DNS, that maps human-friendly names (domain names) to IP addresses. Rather than typing 22.214.171.124 into our browser we can simply type example.com and the DNS server will automatically direct us to the correct IP address.
This brings us to the Uniform Resource Locator (URL), also known as a web address. A URL is simply the address for a resource on the web, like https://www.nytimes.com/ or https://twitter.com/_Metamorphous.
Let’s break down the elements of a URL using an example from the Metamorphous blog.
- https – The “Scheme” tells the browser which protocol it should use to request the resource, in this case the secure version of HTTP as we are requesting a web page
- metamorphous.co – The “Authority”, which is usually the domain name for the web server but could also be an IP address. It contains a top-level domain (TLD) like .com or .org.uk and a second-level domain (SLD), usually the business or brand name (in this case metamorphous)
- /post/the-top-non-technical-jobs-in-tech – the “Path”, or address for the specific file we are requesting on the web server
- ?utm_source=twitter&utm_medium=tweet – “Parameters” are optional pieces of additional information we can send to the web server. In this case we are telling the server where the link originated from (it was a tweet shared on Twitter) to help with marketing attribution (tracking where users are coming from when they visit a site)
If our URL only has a domain name, like google.com, then we are basically asking to be taken to the “entrance” or homepage of the server that is located at that IP address. If our URL also contains a path, like metamorphous.co/blog, then we are asking the web server for a specific resource, in this case the web page that lists blog posts. Many websites will also use subdomains (eg: https://subdomain.example.com) to help separate and organize the site.
What else is happening when we open up our browser and enter google.com, or click a link?
Our browser requests the correct IP address for the web server from the DNS server. It then sends a request to the web server. If we are browsing the web then it’s very likely that we will be using the HTTP protocol to make an HTTP request. An HTTP request can do a few things, including:
- Ask the server to send us a webpage, known as a GET request.
- Send some data to the server to process, for example the content of a Tweet we want to post. This is known as a POST request.
- Update some existing data on the server, known as a PUT request
- Delete some data from the server, known as a DELETE request.
Back to our example. After the web server has processed our request it will send a response. The response will include a response code that tells our browser whether the request has been successful – for example, a response of 200 means the request has been successful while a response of 404 means that the server could not find the requested resource.
A successful server response will also have a message body containing the resource that was requested. This will usually be an HTML (Hyper Text Markup Language) file. The web browser will read the HTML file and then render it as a web page for us to view.
HTML, CSS, JS & Cookies
Hyper Text Markup Language (HTML)
HTML is the code that provides details of the structure and content of a web page. Raw HTML pages were the first pages on the web and were extremely basic, usually just text with some minor formatting options to make the text appear in formats like bold or italic.
An HTML file is made up of elements that wrap around pieces of content, allowing us to set how the content should look or act. Each element is wrapped in opening <> and closing </> tags that outline what the element should do, for example display an image or create a link to another page.
<p>This is an example of a paragraph element used for longer sections of text, with an opening paragraph tag at the start and a closing paragraph tag at the end.</p>
<p class=”example-class”>This is an example of a paragraph element with a class attribute called example-class.</p>
The HTML file itself will start with a <head> element that contains additional elements with general information about the page, like its <title>, and any external resources it needs, like CSS or JS files. After the closing </head> tag you’ll usually find the <body> element, which will contain all of the content that is going to be rendered visible on the webpage, like the text and any images or tables.
Cascading Style Sheets (CSS)
CSS files tell the browser how a web page’s content should be styled. They contain lists of rules that are targeted to different elements of the HTML.
As an example, if you wanted to style the main heading text on the page you would target the H1 tags:
<h1>This is our heading text.</h1>
With CSS similar to this:
The CSS above is saying “for any H1 elements on the page, make the font size 2em, the font weight 800, and the color #999999”.
We can also use CSS to target specific classes of each element. Using our example from the HTML section above, if we wanted to change the size of the text for the paragraph element with the class name “example-class” then we could target it like this:
In this case the CSS is saying “for any paragraph elements that are also classes of “example-class”, make the font size 40px”.
A cookie is a small file that a web server sends to a user’s web browser to help identify the web browser for future requests. A common cookie use is for session management, which is a technical way of describing how a web server can remember if a user is signed in to their service. A session management cookie will contain information that identifies the user to the web server when each request is sent, allowing the user to remain signed in as they use the site (otherwise they would have to re-authenticate with the web server at each request which would be very annoying!).
When our web browser has received all of the HTML, CSS, JS, cookies, images and videos then it can render the final completed page for us to browse.
I know I can get frustrated when I click a link and it takes a while to process but the reality of what is going on in the background is really quite amazing.
What started as a research project attempting to deal with nuclear strikes during the cold war now allows me to sit in Sweden and make request files from a server in the USA. The request will travel across undersea fiber optic cables at close to the speed of light, then a server in the USA will receive and respond to my request by sending files back to my web browser to process and render.
And all of this happens in less than a second.
So to the creators of ARPANET, Tim Berners-Lee, Marc Andresson and everyone else involved with the development of the internet and web: thank you!