Website Loading

(Credits: Flickr, patriziasoliani)

Whenever a person types in www.google.com in his address bar, behind the scene lots of works happen to load the website of Google. The the very act of website loading requires proper functioning of various elements of technology stack. There is DNS System helping to connect with the server. one needs to know about lots of lower level protocols to actually transmit the data. Also one needs to be mindful of downloading the images and all required assets for proper website loading.

Since the internet was a very complex project, it was split into independent layers to help technologists build various complex aspects of it. These layers combined together is called as “Internet Protocol Stack“. The protocol is just a set of rules, which needs to be followed by the software implementing it. The top layer protocols work independently of bottom layer protocols. All the layers are given a predefined responsibilities to perform.  The various layer of stack and their responsibilities are listed below.

Layers of Internet Protocol Stack:

  1. Application layer: This is topmost layer of internet protocol stack. This layer is tasked with interacting with the user. A Web Browser works in this layer. The Domain Naming System (which is helper system for name resolution) is also an application layer protocol. Various services like web browsing, e-mail, file sharing, are done by protocols of this layer itself.
  2. Transport layer: Transport layer provides various services to application layer via ports. This layer abstracts host to host services. (A server and clients computers are called as hosts). This layer provides connection oriented/connection less tunnel like reliability services by subdividing the data for easy transmission and sequencing of data at end host to be presented to topmost layer. This layer also ensures traffic congestion doesn’t happen between the hosts.
  3. Internet layer: Internet layer provides the end to end routing services to transport layer. Each computer/router is identified by an unique IP address, to help in routing. Also to help transmit a packet efficiently the routers shares their data with other routers via various routing protocols.
  4. Link layer: Link layer’s job is to transmit the packet from one node on a network to another node on network. (The nodes are various internet devices like routers, switches, computer’s network cards etc.) Here another addressing scheme called Media Access Control (MAC) is used for transmitting the data between 2 network nodes. The physical transmission protocols like WiFi, Ethernet etc are done in this layer. Also to establish routes protocols like OSPF, ARP, RARP, NDP is used. This layer is tasked with actual transmission of data between 2 IP Addresses.

These are 4 layers of TCP/IP Stack.

Website Loading: Players of Ecosystem

The world wide web is built on protocol called HTTP which stands for Hyper Text Transfer Protocol. Thats the main reason why websites show http:// in the beginning. The HTTP is application layer protocol designed to send HTML (Hyper Text Markup Language) documents which display a web page. Computers which understand the HTTP requests are called as Servers. Client is the computer, which requested the HTML resource by sending HTTP request. Browser is the program which interprets the HTML doc and displays it. The URL(Uniform Resource Locator) is addressing scheme used to identify web resource.

When Sir Tim Berners Lee introduced web for the first time, he designed all the components of ecosystem. They are – browser program, server program, HTTP protocol, HTML mark up language, URL addressing scheme. Below is some facts about the WWW ecosystem.

  • The first browser was called World Wide Web. Later renamed as Nexus.
  • The first server was called CERN HTTPd (CERN Hyper Text Transfer Protocol Daemon).
  • The first website was info.cern.ch.
  • The first URL was http://info.cern.ch/hypertext/WWW/TheProject.html.

Website Loading: Work done at Application Layer

When you type the site name in browser’s address bar, the browser first establishes connection with the server. The Server Address is obtained by querying the DNS. Destination Server address obtained via DNS is then embedded in transport layer’s destination address field. The HTTP request is prepared and given to transport layer in data field. (Note: HTTP uses Transport Layer protocol called TCP – Transmission Control Protocol for its communications.)

Website Loading Request:

The HTTP request consists of 3 main sections  at the top request line. The request line is like this.

<Method> <URL Path> <HTTP Version>

Ex: GET /index.html HTTP/1.1

The GET is request to server requesting it to give it give resource identified at given path. and HTTP Version its using. Below the request line other additional parameters are sent. These additional parameters are called as Header fields. Some header fields are mandatory and others are optional. (Refer to this wiki for details on header fields.) One has to note that Browser type is also one of the header field called with name user-agent:.

Server Response:

Once the query is made to server, the server searches in its resource pool and gives the response. Like request the HTTP response also starts off with status line. Below status line the usual headers follow. One has to note that Server also identifies itself in a header field called server:. After a blank line the response body begins containing HTML code.
The HTML response like request has 3 main sections in its status line. The status line is like this.

<HTTP Version> <Status Code> <Response Phrase>

Ex: HTTP/1.1 200 OK

The headers follow the status line followed by body containing resource requested. The status codes are subdivided various series.( Refer to this wiki article for list of all the status codes.)Remember that 400 series status are because of client i.e. browser made mistakes. 500 series errors are because of server problems. 300 series requires client browser to take additional actions. The famous 404 error means client requested resource which doesn’t exist, hence its client side mistake. Error 500 which is bloggers like me encounter a lot, means server has gone kaput for some sort of mis-configuration, means mistake is at server side.

DNS resolution:

The work of DNS is to fetch the IP address of the server, only after this browser can continue its website loading works. (Note: DNS uses Transport layer protocol called UDP – User Datagram Protocol for communications.)

Whenever you browse a website, its IP address (aka A record) is stored by your operating system in DNS cache for later use.  When a website’s IP address / A record is not available in DNS cache of the OS, a DNS query is automatically sent to your ISP’s(Internet Service Provider) Recursive Resolver. If recursive resolver doesn’t has A record (PS: often times it has) it keeps you waiting and asks the Root Nameserver for it.(PS: there are only 13 root nameservers. They have links to all the TLD’s) Root nameserver forwards the query to appropriate TLD nameservers. (E.g. query to www.google.com will get forwarded to .com TLD nameserver.) The TLD nameserver forwards the query to authoritative nameserver which gives the A record. Once recursive resolver fetches the A record, it keeps a copy with it and sends the record to you.

Conclusion:

The above mentioned steps are done during website loading. The activities of all these protocols is done at application layer, which sits atop Transmission, Internet and Link layers which in turn do lot more work to keep the internet running. So its worth while to consider the WWW as a public web with decent gentlemen doing the background work. If you have noted the header fields, servers do have lots of information to identify a computer. Its because of that efficient communications happen. If you want to take cue about privacy from above explanation of headers, understand that WWW is public. Only thing stored in your computer or encrypted content is private.