Do you know what happens when do you type a URL?
A common activity when we are in front of our computers is to open a browser and type in a URL. Surely if you are reading this blog it is because you have asked yourself what happens when you type in a URL, press enter and that is how you have in front of you the web page you are looking for.
Before explaining in general terms what makes up a web infrastructure, let’s first start by explaining what a URL is and what are its elements.
An URL (Uniform Resource Locator) is a unique web address that specifies your location anywhere on the Internet. The term URL refers to a subset of URIs (Uniform Resource Indicators) that identifies the web resource and provides the means to locate it. Yes, this explanation is quite technical but take a look at the following URL:
Now, what are its main elements?
The first part of this URL is the HTTP or HTTPS (Hypertext Transfer Protocol Secure), always present at the beginning of a URL, these protocols are used to transfer information between different client-server actors, in short, this protocol allows the client/user to make requests for information such as HTML documents, text, images, videos, data exchanges with the web page, among others…
The Subdomain, is the second element, is the one that appears between the Protocol and the first point of the URL, the most known is www, there are others like blog., email., es., etc. Some URLs may omit the subdomain, this is known as the “naked domain”.
The next element is the Domain, which is composed of:
1. The domain name, in the example we use is holbertoschool, but it can be Facebook, Amazon, Netflix, or any of the millions that exist with a unique name; it is defined as a “string” or letters because it is easier for human memory to remember a name instead of the numbers of an IP.
2. For the TLD (top domain level) in the .com example, the part after the last point on the URL. There are different TLDs among the best-known .com, .org, .red, .es, and many more.
If you notice, each point in the URL represents a different segment, the URL taken as an example is a basic URL, there are more elements within it as the path, parameters (querystring), hashtag … but for now, let’s leave it here if you want to deepen read this link …
Now let’s get into the subject, what is the DNS (Domain Name System), because that’s where it all starts, it takes the domain name and translate it to an IP in order to locate the destination of the website. It sounds simple, isn't it? but there is much more behind it.
The main task of the DNS is to resolve name assignment requests, in other words, it takes the request (when you type the URL and press enter), it first asks the browser if it knows the requested domain. If it has the information it responds. If not, the browser asks the Operating System (OS) through the called resolver if the requested URL exists in its IP address cache. If the OS also fails to resolve the request, it redirects to the appropriate DNS server, which is usually the Internet provider’s DNS server. Here, a search is made in the DNS database and if the operation is successful, the corresponding IP address is sent in response. Yes!
But wait, the journey can be longer, if the Internet provider’s DNS server fails to resolve the requested IP, it delivers as a response the next DNS server in the hierarchy, which in this case would be TLD (for the purposes of our example URL .com), it knows the domain name servers and returns to resolver the IP address you are looking for. After all this process the resolver returns to the point where it started, it responds to the browser with the IP address.
The resolver delivers all the information provided in each query between the DNS servers to the OS and the OS stores it in its cache, so if you query this URL again you don’t have to do this whole process. Ugh! what a trip, the best thing is that it is done so quickly that you will hardly notice it. Great, isn’t it?
TCP/IP(Transmission Control Protocol/Internet Protocol)
Let’s move on to the next segment of the infrastructure. Well, knowing the IP in this case of https://wwwholbertonschool.com, starts the process of connecting to the website. The Transmission Control Protocol / Internet Protocol is a set of communication protocols used to interconnect network devices on the Internet or any private network, the HTTPS protocol is part of this set of protocols.
The TCP/IP protocol specifies how data is exchanged over the Internet. It divides the message to be sent to the server into packets and sends them over different paths on the Internet, these packets are reassembled at the other end (which is where you want to send the message). The message is split into packets because if a single packet is sent, the path that the packet could take may be unavailable or congested and the message may take a long time to arrive or may never arrive at all.
How secure is the application?
Have you ever wondered about the difference between these HTTP and HTTPS protocols? The main difference is security, the HTTPS protocol uses a combination of two protocols (HTTP+SSL/TLS) that makes any kind of information transmitted on the network encrypted and no one can access it.
You can identify a page as secure by looking at the URL and verifying that it is HTTPS or make sure you see the lock at the top of the navigation bar.
The Firewall is the first line of security in the network, in technical terms, a Firewall is a network security system that can monitor incoming and outgoing traffic, its main purpose is to allow or block request traffic that does not comply with the security rules defined to access the servers of a website.
If there are many applications, how are they distributed?
Imagine a page like Netflix or Amazon’s how they do to control the high volume of traffic because the load-balancer does it, it redirects a request made by the client/user to the servers where the website is stored, the just mentioned web pages have such a large infrastructure composed of many servers.
How does it do this? Through algorithms, there are several of them as:
Random: This is a simple implementation of how to distribute the requests in the server pool.
Round Robin: It works like a circular queue in which once the last server listed is finished, the next request will be served again by the first server in the queue.
Weighted Round Robin: Each server listed in the circular queue is given a maximum number of requests that it can support, with this the balancing will be more proportional and will take into account the individual capabilities of each server in the queue.
Dynamic Round Robin: It takes into account the individual capacities of each server, it monitors them to know the actual occupation status of each one and thus make a better load balancing.
Fastest: Balances a load of requests based on the response time of the servers to the client.
Least Connections: Requests are sent to the server that is least busy receiving requests.
Observed: Combination of Fastest and Least Connections, which makes the decision to balance based on the level of occupation of the server and its response times to the requests.
Predictive: It is based on the same algorithm used in Observed and also analyzes how the request loads and response times have been over time, so you can make the decision when the time to balance the requests passes in a different order.
The optimization and efficiency of resources depend very much on the proper use of the algorithms.
In short, the load-balancer is responsible for reducing the workload of the servers, providing faster responses, optimizing server resources, increasing reliability, etc.
Where are the websites stored?
Web server and Application server
This is where applications come in to manage and deliver the necessary content you see on the computer screen.
In the Application server, the applications are hosted and a website goes from static to dynamic. An application server is a software where the web applications are executed and connected to the database, where all the business logic is developed and it interacts with the client/user.
A database is an organized collection of data stored on a physical (computer) or virtual server that is administered by a database management system (DBMS) that interacts with the client/user. There is software that manages the data and through it, you can organize, update, and retrieve the stored data such as MySQL, PostgreSQL, MSSQL, Oracle Database, and Microsoft Access.
Network monitoring software
The importance of monitoring has to do with optimizing resources and reducing network errors. “What is not measured, is not controlled and what is not controlled, cannot be improved”; this software collects all the information about what has happened in our servers and compiles it into metrics that are monitored. Some of the most popular monitoring tools are Newrelic, Datadog, Uptime Robot, Nagios, and WaveFront.
The following image tries to put together each segment of this structure step by step:
Web server and Application server: https://whatis.techtarget.com/definition/Web-server