
What exactly happens when a user types google.com in the address bar? Part 1
Translation of the first part of the material from github , which explains in detail the Internet: what exactly happens when a user types google.com in the address bar?
To start the count, we will choose the moment when the “enter” button is recessed. At this moment, the circuit responsible for this button closes. A small current flows through the logic circuits of the keyboard. They scan the state of all switches, suppress stray electrical pulses, and convert the keystroke to the key code 13. The controller encodes the code for transmission to the computer. Now this is almost always done via USB or Bluetooth, and before PS / 2 or ADB participated in the process.
The USB in the keyboard is powered with a voltage of 5V on the first pin of the USB host controller in the computer. The key code is stored in the keyboard memory in the “endpoint” register. Every 10 milliseconds, the USB controller requests data from this register. So he gets the saved codes. The code is transferred to the USB SIE (Serial Interface Engine) and converted to one or more packets of the low-level USB protocol. Packets are sent via a differential electrical signal over the D + and D- pins at a maximum speed of 1.5 Mb / s, since the HID (Human Interface Device) is considered a low-speed device.
The serial signal is then decoded in the controller and interpreted by the keyboard HID driver. The value of the code is transferred to the abstraction layer of the iron of the operating system.
When the user places a finger on the capacitive screen, a small current flows between him and the finger. This closes the circuit in the electrostatic field of the conductive layer and creates a voltage drop at this point in the screen. The screen controller raises the interrupt, reporting the coordinates of pressing.
The mobile OS sends a message to the current application about a click on one of the GUI elements (in this case, the keys of the virtual keyboard). The keyboard raises an interrupt to send a message about a key press in the OS.
The keyboard sends an interrupt request (IRQ), which the interrupt controller maps to the interrupt vector. The CPU uses the IDT interrupt descriptor table to map vectors to the interrupt handler functions that the kernel provides. Upon arrival of the interrupt, the CPU starts the desired handler. So we get to the core.
The HID passes the click event to the KBDHID.sys driver, which converts it into a scancode. In our case, it is equal to VK_RETURN (0x0D). This driver communicates with the KBDCLASS.sys driver. The latter is responsible for handling keyboard input in a safe way. It calls Win32K.sys (possibly after sending a message through different keyboard filters). This happens in kernel mode.
Win32K.sys recognizes which window is active through the GetForegroundWindow () API. This API provides a handler for the browser input line. Then the Windows message processing system calls SendMessage (hWnd, WM_KEYDOWN, VK_RETURN, lParam). lParam - bit mask containing additional information about pressing - repetitions, scan code, whether additional keys are pressed, etc.
The SendMessage API queues the message for the specified window handle (hWnd). Later, WindowProc's message processing function is called to process the queue.
The active hWnd window is the editing window, in which case WindowProc has a message handler for WM_KEYDOWN. Since the code VK_RETURN was passed, he knows that the user has pressed Enter.
An interrupt signal triggers an event in the I / O Kit driver. It converts the signal into key code and passes it to the WindowServer process. He creates an event for active applications through their port Mach. Events are queued for these applications. They are read from there by threads that have the appropriate access level using the mach_ipc_dispatch function. This is most often done through the main NSApplication loop using the NSEvent / NSEventType KeyDown.
When using the X graphic server, the evdev driver will be used to get the code. According to existing rules, the key code will be converted to a scan code. After that, the symbol is transferred to the window manager (DWM, metacity, i3, etc.), which in turn passes the symbol to the window in focus. The graphical window API receives a symbol and displays the corresponding symbol in the window in which the focus is located.
The browser now has the following information from the URL (Uniform Resource Locator):
If no protocol is specified and the string is not a valid domain name, the browser passes this text to the default search engine.
The browser checks the list of HSTS (HTTP Strict Transport Security). This is a list of sites that should only be accessed via HTTPS. If the site is listed, then the browser sends a request via HTTPS. Otherwise, through HTTP.
We convert characters in the server name that do not belong to the ASCII table.
The browser checks to see if there are characters not from the ranges az, AZ, 0-9, - ,.
Since we have google.com, there will be no such characters. Otherwise, encoding using the Punycode system will be applied to the server name.
The browser checks if the domain is in the cache. If not, the gethostbyname library function is called (depending on the OS). It checks if the server address can be found by name based on information from the local hosts file. If this does not help, a request is made to the DNS server, which is specified in the network settings. This is either a local router or the provider's DNS server. If the DNS server is on the same subnet, the library runs ARP with the server. Otherwise, the request is sent to the IP address of the standard gateway.
To send an ARP broadcast request, the network stack needs to know the IP address of the recipient and the MAC address of the interface that will be used for this.
First, the ARP cache is checked for the IP of the recipient. If it is in the cache, the result "Recipient IP = MAC" is returned.
If it is not in the cache, then the routing table is scanned for the presence of an IP address in any of the local subnets. If it is there, the interface assigned to this subnet is used. If not, the library uses the subnet interface of the primary gateway. Then, the MAC address of the selected interface is searched and a Layer 2 ARP request is sent:
If the computer is connected directly to the router, the router gives an ARP Reply. If the connection goes through the hub, it sends a request to all ports. If there is a router there, he will give an answer. If the connection is through a switch, it will determine from its CAM / MAC table which port has the desired MAC address. If it does not, then it will distribute the request to all ports. And if it does, it will send a request to the port where the desired MAC is.
Reply ARP Reply:
Now the library has the IP address of either the DNS server or the main gateway. You can continue the process of domain recognition. Port 53 opens and a UDP request is sent to the server (in case of large requests, TCP is used). If there is no information from the DNS server, then a recursive search is requested, which goes through the list of DNS servers until it reaches the SOA and the desired answer is found.
When the browser receives the destination server’s IP address, it uses the port (default HTTP port is 80, for HTTPS 443) and uses them as parameters for calling the socket function and requests the TCP socket stream - AF_INET and SOCK_STREAM.
First, this request is passed to the transport layer, where the TCP segment is created. The destination port is added to the header, and the source port is selected dynamically from the list of kernel ports (in Linux, this is ip_local_port_range).
The segment is sent to the network layer, where it is added the ip-header, which contains the ip-address of the destination server and the ip-address of our computer. Then the package enters the Link Layer. A frame header is added to it, which contains the MAC address of the computer and the gateway (local router). If the gateway MAC address is not known to the kernel, an ARP broadcast is sent to find out the gateway. And now our package is ready to be sent via Ethernet, WiFi or mobile.
In most cases, a packet from a computer passes through a local area network, then gets into a modem (modulator / demodulator), where it is converted from a digital to an analog signal. Such a signal can be transmitted by telephone, cable or wireless connection. The receiving side modem converts it back to digital form, from where it arrives at the next network node, where the addresses of the sender and receiver are further analyzed.
Sometimes a packet is sent immediately via Ethernet or optics, then it remains digital and reaches the next network node. In the end, the signal reaches the local subnet router. From there, it goes through the border routers AS, other AS, and reaches the destination server. Each router along the way retrieves the destination IP address and passes the packet to the next hop. In this case, the TTL in the packet decreases by one. A packet is discarded if it reaches zero, or if the current router has run out of queue space. Sending and receiving packets occurs multiple times within a TCP connection.
First, the client selects the initial sequence number (ISN) and sends the packet to the server, setting the SYN bit so that it is clear that it is an ISN.
If the server receives the SYN and is in a good mood, then it selects its ISN, sets the SYN to indicate that the packet contains the ISN, copies the client ISN + 1 in the ACK field and adds the ACK flag to confirm receipt of the first packet.
The client confirms the connection by sending a packet where its ISN increases, the sender’s ISN increases and the ACK field is set.
Data is transmitted like this: when one side sends N bytes, it increases SEQ by this number. When the other side acknowledges the receipt of the packet (or chain), it sends an ACK packet, where the ACK value is equal to the last sequence received from the other side.
To close the connection, the closing side sends the FIN packet, the other side acknowledges the receipt of the packet and sends its FIN, and the first acknowledges its receipt.
The enter button returns to its original position.
To start the count, we will choose the moment when the “enter” button is recessed. At this moment, the circuit responsible for this button closes. A small current flows through the logic circuits of the keyboard. They scan the state of all switches, suppress stray electrical pulses, and convert the keystroke to the key code 13. The controller encodes the code for transmission to the computer. Now this is almost always done via USB or Bluetooth, and before PS / 2 or ADB participated in the process.
If it is USB
The USB in the keyboard is powered with a voltage of 5V on the first pin of the USB host controller in the computer. The key code is stored in the keyboard memory in the “endpoint” register. Every 10 milliseconds, the USB controller requests data from this register. So he gets the saved codes. The code is transferred to the USB SIE (Serial Interface Engine) and converted to one or more packets of the low-level USB protocol. Packets are sent via a differential electrical signal over the D + and D- pins at a maximum speed of 1.5 Mb / s, since the HID (Human Interface Device) is considered a low-speed device.
The serial signal is then decoded in the controller and interpreted by the keyboard HID driver. The value of the code is transferred to the abstraction layer of the iron of the operating system.
If it is a virtual keyboard (touch screen)
When the user places a finger on the capacitive screen, a small current flows between him and the finger. This closes the circuit in the electrostatic field of the conductive layer and creates a voltage drop at this point in the screen. The screen controller raises the interrupt, reporting the coordinates of pressing.
The mobile OS sends a message to the current application about a click on one of the GUI elements (in this case, the keys of the virtual keyboard). The keyboard raises an interrupt to send a message about a key press in the OS.
Interruption Occurred (Not on USB Keyboard)
The keyboard sends an interrupt request (IRQ), which the interrupt controller maps to the interrupt vector. The CPU uses the IDT interrupt descriptor table to map vectors to the interrupt handler functions that the kernel provides. Upon arrival of the interrupt, the CPU starts the desired handler. So we get to the core.
(Windows) WM_KEYDOWN message is sent to the application
The HID passes the click event to the KBDHID.sys driver, which converts it into a scancode. In our case, it is equal to VK_RETURN (0x0D). This driver communicates with the KBDCLASS.sys driver. The latter is responsible for handling keyboard input in a safe way. It calls Win32K.sys (possibly after sending a message through different keyboard filters). This happens in kernel mode.
Win32K.sys recognizes which window is active through the GetForegroundWindow () API. This API provides a handler for the browser input line. Then the Windows message processing system calls SendMessage (hWnd, WM_KEYDOWN, VK_RETURN, lParam). lParam - bit mask containing additional information about pressing - repetitions, scan code, whether additional keys are pressed, etc.
The SendMessage API queues the message for the specified window handle (hWnd). Later, WindowProc's message processing function is called to process the queue.
The active hWnd window is the editing window, in which case WindowProc has a message handler for WM_KEYDOWN. Since the code VK_RETURN was passed, he knows that the user has pressed Enter.
(OS X) KeyDown NSEvent passed to application
An interrupt signal triggers an event in the I / O Kit driver. It converts the signal into key code and passes it to the WindowServer process. He creates an event for active applications through their port Mach. Events are queued for these applications. They are read from there by threads that have the appropriate access level using the mach_ipc_dispatch function. This is most often done through the main NSApplication loop using the NSEvent / NSEventType KeyDown.
(GNU / Linux) Xorg server tracks codes
When using the X graphic server, the evdev driver will be used to get the code. According to existing rules, the key code will be converted to a scan code. After that, the symbol is transferred to the window manager (DWM, metacity, i3, etc.), which in turn passes the symbol to the window in focus. The graphical window API receives a symbol and displays the corresponding symbol in the window in which the focus is located.
URL parsing
The browser now has the following information from the URL (Uniform Resource Locator):
Protocol “http” - use 'Hyper Text Transfer Protocol'
Resource "/" - request the main page
Is this a URL or a search query?
If no protocol is specified and the string is not a valid domain name, the browser passes this text to the default search engine.
Check HSTS List
The browser checks the list of HSTS (HTTP Strict Transport Security). This is a list of sites that should only be accessed via HTTPS. If the site is listed, then the browser sends a request via HTTPS. Otherwise, through HTTP.
We convert characters in the server name that do not belong to the ASCII table.
The browser checks to see if there are characters not from the ranges az, AZ, 0-9, - ,.
Since we have google.com, there will be no such characters. Otherwise, encoding using the Punycode system will be applied to the server name.
DNS query
The browser checks if the domain is in the cache. If not, the gethostbyname library function is called (depending on the OS). It checks if the server address can be found by name based on information from the local hosts file. If this does not help, a request is made to the DNS server, which is specified in the network settings. This is either a local router or the provider's DNS server. If the DNS server is on the same subnet, the library runs ARP with the server. Otherwise, the request is sent to the IP address of the standard gateway.
ARP (Address Resolution Protocol)
To send an ARP broadcast request, the network stack needs to know the IP address of the recipient and the MAC address of the interface that will be used for this.
First, the ARP cache is checked for the IP of the recipient. If it is in the cache, the result "Recipient IP = MAC" is returned.
If it is not in the cache, then the routing table is scanned for the presence of an IP address in any of the local subnets. If it is there, the interface assigned to this subnet is used. If not, the library uses the subnet interface of the primary gateway. Then, the MAC address of the selected interface is searched and a Layer 2 ARP request is sent:
Sender MAC: interface:mac:address:here
Sender IP: interface.ip.goes.here
Target MAC: FF:FF:FF:FF:FF:FF (Broadcast)
Target IP: target.ip.goes.here
If the computer is connected directly to the router, the router gives an ARP Reply. If the connection goes through the hub, it sends a request to all ports. If there is a router there, he will give an answer. If the connection is through a switch, it will determine from its CAM / MAC table which port has the desired MAC address. If it does not, then it will distribute the request to all ports. And if it does, it will send a request to the port where the desired MAC is.
Reply ARP Reply:
Sender MAC: target:mac:address:here
Sender IP: target.ip.goes.here
Target MAC: interface:mac:address:here
Target IP: interface.ip.goes.here
Now the library has the IP address of either the DNS server or the main gateway. You can continue the process of domain recognition. Port 53 opens and a UDP request is sent to the server (in case of large requests, TCP is used). If there is no information from the DNS server, then a recursive search is requested, which goes through the list of DNS servers until it reaches the SOA and the desired answer is found.
Opening socket
When the browser receives the destination server’s IP address, it uses the port (default HTTP port is 80, for HTTPS 443) and uses them as parameters for calling the socket function and requests the TCP socket stream - AF_INET and SOCK_STREAM.
First, this request is passed to the transport layer, where the TCP segment is created. The destination port is added to the header, and the source port is selected dynamically from the list of kernel ports (in Linux, this is ip_local_port_range).
The segment is sent to the network layer, where it is added the ip-header, which contains the ip-address of the destination server and the ip-address of our computer. Then the package enters the Link Layer. A frame header is added to it, which contains the MAC address of the computer and the gateway (local router). If the gateway MAC address is not known to the kernel, an ARP broadcast is sent to find out the gateway. And now our package is ready to be sent via Ethernet, WiFi or mobile.
In most cases, a packet from a computer passes through a local area network, then gets into a modem (modulator / demodulator), where it is converted from a digital to an analog signal. Such a signal can be transmitted by telephone, cable or wireless connection. The receiving side modem converts it back to digital form, from where it arrives at the next network node, where the addresses of the sender and receiver are further analyzed.
Sometimes a packet is sent immediately via Ethernet or optics, then it remains digital and reaches the next network node. In the end, the signal reaches the local subnet router. From there, it goes through the border routers AS, other AS, and reaches the destination server. Each router along the way retrieves the destination IP address and passes the packet to the next hop. In this case, the TTL in the packet decreases by one. A packet is discarded if it reaches zero, or if the current router has run out of queue space. Sending and receiving packets occurs multiple times within a TCP connection.
First, the client selects the initial sequence number (ISN) and sends the packet to the server, setting the SYN bit so that it is clear that it is an ISN.
If the server receives the SYN and is in a good mood, then it selects its ISN, sets the SYN to indicate that the packet contains the ISN, copies the client ISN + 1 in the ACK field and adds the ACK flag to confirm receipt of the first packet.
The client confirms the connection by sending a packet where its ISN increases, the sender’s ISN increases and the ACK field is set.
Data is transmitted like this: when one side sends N bytes, it increases SEQ by this number. When the other side acknowledges the receipt of the packet (or chain), it sends an ACK packet, where the ACK value is equal to the last sequence received from the other side.
To close the connection, the closing side sends the FIN packet, the other side acknowledges the receipt of the packet and sends its FIN, and the first acknowledges its receipt.