A world in which IPv6 is well conceived
Translation of an article by Avery Pennarun, one of Google’s employees, about why the modern Internet is what it is, about the history and background of IPv6, and how the ideal IPv6 protocol would work, why it isn’t, and how can this ideal comes nearer.
Last November, I went to the IETF for the first time. The IETF is an interesting place: it seems that one third consists of hard accompanying work, one third of the expansion of already created things, and one third of crazy, far from reality studies (in this place Avery used the phrase “blue sky insanity”, formed by him from the expression blue skies research - approx. transl.) . I took part mainly because I wanted to see how people react toTCP BBR, which was first introduced . (Answer: for the most part, positively, but with disbelief. It seemed too good to live up to expectations.)
Be that as it may, the IETF meetings consisted of many presentations about IPv6, which was supposed to replace the IPv4 protocol, which constitutes the foundation of the internet. (Some would say that the replacement is already underway; some that it has already happened.) In addition to these presentations about IPv6, there are a large number of people who consider it the best, greatest of all, and they are sure that it is about to finally arrive (At Any Moment), and IPv4 is just a big bunch of hacks destined to die so that the Internet becomes beautiful again.
I thought it would be a good opportunity to actually try to figure out what was going on. Why is IPv6 so confusing as compared to IPv4? Wouldn't it be better if it was just IPv4 with an increased number of bits in the address ? But no, for heaven’s sake, everything was done wrong. So I started asking everyone around, and here's what I found out.
Once upon a time there was a telephone network that used physical circuit switching. In essence, this meant moving the connectors in such a way that your phone literally turned out to be connected with a very long wire ( OSI level 1 ). And the “leased line” was the very long wire that you leased from the telephone company. You put bits on one side into this wire, and from the other end they came out after a fixed period of time. You didn't need addresses, because there was only one car at each end.
Once telephone companies have optimized all this a bit. Time division multiplexing (TDM) and “virtual channel switching” have appeared. Telephone companies could transparently take bits at low speed from many lines, group them together using multiplexers and demultiplexers, and send them through the telephone system using fewer wires than before. For this to work, more work was required than before, but so far for us modem users everything was still the same: we put the bits in one direction, they pop out of the other. No addresses needed.
The Internet (then not yet called) was built on top of these channels. You had a bunch of wires in which you can stick bits and catch on the other hand. If one computer has two or three network interfaces, then it can, if properly instructed, send bits from one line to another, and you can do something much more efficient than separate communication lines between each pair of computers. And so there were IP addresses ("level 3"), subnets and routing. Even then, with these point-to-point channels, you did not need MAC addresses, because as soon as the packet was in the wire, there was only one place from where it could exit. You only needed IP addresses to decide where it should go after that.
Meanwhile, as an alternative, local area networks (LANs) were invented. If you wanted to connect your computers (or terminals and the mainframe) at home, you got the inconvenience in the form of many interfaces that you had to have for each connection in the star topology. To reduce the cost of electronics, people needed a bus-type network (also known as a “broadcast domain,” a concept that will be important in the future), where many stations could simply be connected in one wire, and talk to anyone who is connected into it. These were not the same people who built the Internet, so they did not use IP addresses for this. They invented their own scheme ("level 2").
One of the early bus-type local area networks was arcnet, dear to my heart (I wrote the first Linux arcnet driver andarcnet poems in the distant nineties, long after arcnet became obsolete). Arcnet level 2 addresses were very simple: only 8 bits set by jumpers or DIP switches on the back of the network card. It was your task as the owner of the network - to configure the addresses and make sure that you do not have duplicates, or any damn thing could happen otherwise. It was a little painful, but arcnet networks were usually quite small, so it was just a bit of a pain.
A few years later, Ethernet came and solved this problem once and for all, using much more bits (in fact, 48) in the addresses of the second level. This is enough bits so that you can assign different from the others (shard-serial(here, apparently, it means that the first three bytes of the MAC address are assigned as a range to a specific manufacturer - approx. transl.) ) the address of each device that has ever been released, and not have intersections. And that is exactly what they did! This is how Ethernet MAC addresses appeared.
Various LAN technologies came and went, including one of my favorites, IPX ( internetwork - approx. Transl.) Packet exchange, although it had nothing to do with the "real" Internet), and Netware, which worked great until then as long as all the clients and servers were on the network from one bus. You never had to configure any addresses. It was beautiful, reliable and efficient. In practice, the golden age of network building.
Of course, someone had to destroy it: large networks of companies / universities. They wanted to have so many connected computers that sharing 10 Mbit / s on a single bus between them all became a bottleneck, so they needed a way to have many buses and to connect “Internet” to each other, if you like, these buses together. You probably think, “of course! Use the Internet Protocol (IP) for this, ”right? Haha no. The Internet protocol, still not called so, was not yet old enough and popular at that time, and no one took it seriously. Netware-over-IPX (and numerous other LAN protocols at that time) was a serious matter, and as any serious business does, they invented their own things to expand the increasingly popular Ethernet. Ethernet devices already had addresses, MAC addresses, which was probably the only thing that people using various LAN protocols could agree on, so they decided to use Ethernet addresses as keys for their routing mechanisms. (In fact, instead of “routing,” they called it bridging and switching.)
The problem with Ethernet addresses is that they are assigned sequentially at the factory so that they cannot be hierarchical. This means that a “bridging table” is not as good as a modern IP routing table, which can contain a record of a route to an entire subnet immediately. To make bridging efficient, you had to remember on which bus network each MAC address could be found. And people did not want to configure each of them with their hands, so this had to be figured out on their own. If you had an intricate network connection using bridges, things got a little complicated. As far as I understand, this is what led to the poem about the spanning tree , and I will probably just leave it here. Poetry is very important in networking.
Be that as it may, for the most part it worked, although it was a bit confusing, and there were broadcast "floods" here and there, and the routes were not always optimal, and it was almost impossible to debug. (You definitely could not write something like traceroute for bridges, because none of the tools that were needed to make it work - such as the ability to configure an address on an intermediate bridge - do not exist in bare Ethernet.)
On the other hand, all of these bridges have been hardware optimized. Zhelezyachnikami simply invented the whole system as a mechanism that deceives software that had no idea about the many buses and bridges between them so that it would work in larger networks. Hardware bridging means the bridge can work really fast, as fast as Ethernet itself. Now it does not sound like something outstanding, but at that time it was very much. Ethernet was 10 Mbit / s, so you might be able to block it by connecting several computers at once, but you could not give out one 10 Mbit / s computer. In those days, it sounded crazy.
In any case, the point is that bridging was a mess that cannot be debugged, but it was fast.
While all this was happening, those same Internet users got to work, and, of course, they did not miss the appearance of cool, cheap LAN technologies. I think it could be about the same time that ARPANET was renamed to the Internet, although I'm not so sure. Let's say that it was, because the story sounds better when it is told confidently.
At some point, progress has gone from connecting individual Internet computers via long-distance point-to-point links to the desire to connect entire local networks together through point-to-point connections. In general, I wanted to have “long bridges”.
You might think: “hey, yes there is no problem, why not build a bridge on a long line of communication and end it?” Sounds good, but it doesn't work. I will not go into details, but in short the problem isoverload control (unfortunately, for some reason there is no Russian translation of this article on a wiki - approx. transl.). The terrible dark secret of Ethernet bridging is the assumption that all your connections work at approximately the same speed, and / or are heavily underloaded, because they do not have a braking mechanism. You just spit out the data as fast as you can, and expect it to come. But when your Ethernet runs at 10 Mbps and your point-to-point connectivity at 0.128 Mbps, this is completely hopeless. Another problem is that figuring out routes using sending over all channels to figure out which one is correct - and thus bridging usually works - is too expensive for slow connections. Non-optimal routing, which is annoying in local networks, where low latency and high throughput, on slow and expensive long-distance communication channels is absolutely disgusting. It just doesn't scale.
Fortunately, Internet users (if the Internet was already called that) worked exactly on the same problems. If we could use Internet tools to connect Ethernet buses together, we would be in good shape.
And then they developed a “frame format” for Internet packets over Ethernet (and arcnet at the same time, and all other types of LAN).
And here everything went awry.
The first problem that needed to be solved was that now, when you put the bag into the wire, it became completely unclear which machine should “hear” it and possibly forward it. If several Internet routers are in the same Ethernet segment, you cannot make it so that they all receive the packet and try to redirect it; this is the path towards packet storms and looped routes. No, you need to choose which router on the Ethernet bus should pick it up. We cannot just use the destination IP address field for this, because we already wrote down the address of the message recipient there, and not the address of the router. Instead, we determine the desired router using its MAC address in the Ethernet frame.
Thus, in order to configure your local IP route table, you would like to be able to say something like “send packets to address 10.1.1.1 through a router with MAC 11: 22: 33: 44: 55: 66.” This is what you would like to express. Important! Your packet is assigned an IP address, but your router is a MAC. But if you've ever set up a routing table, you may have noticed that no one writes them that way. Instead, you write: "send packets to 10.1.1.1 through a router on 192.168.1.1."
In fact, this only complicates things. Now your operating system should first find the MAC address for 192.168.1.1, understand that it is 11: 22: 33: 44: 55: 66, and finally collect the packet with the Ethernet destination address 11: 22: 33: 44: 55: 66 and destination address IP 10.1.1.1. The address 192.168.1.1 is not specified anywhere in the package, it is just an abstraction for people.
To make this useless intermediate step, you need to add ARP (Address Resolution Protocol), a simple non-IP protocol whose task is to convert an IP address to an Ethernet address. This is done by broadcasting a request to everyone in the local Ethernet segment, asking if they have this IP address. If you have bridges, they should forward all ARP packets to all of their interfaces, because they are broadcast packets, which is exactly what the word broadcasting means. In a large, busy Ethernet network with many connected LANs, redundant broadcasts become one of your nightmares. This is especially bad on WiFi networks. Over time, in order to deal with this problem, people came up with bridges / switches with special hacks to avoid ARP forwarding while it is technically possible. Some devices (especially Wi-Fi hotspots) simply respond with fake ARP responses to help. But this is all crutches, although sometimes necessary.
Time passed. Once (and actually it took a decent amount of time), people almost stopped using non-IP protocols in Ethernet. So basically all the networks became a physical wire (level 1), with many stations on the bus (level 2), the buses are connected using bridges (got caught! Still level 2), and these inter-buses are connected by IP routers (level 3) )
Some time later, people got tired of manually setting IP addresses in the arcnet style, and wanted them to configure themselves in the Ethernet style, well, except that it was too late to do this in the Ethernet style, because a) the devices were already released with Ethernet addresses, not IP, b) IP addresses were only 32-bit, which is not enough to just produce them endlessly without intersections, and c) a simple sequential assignment of IP addresses instead of using subnets would bring us back to the beginning: it would be another Ethernet made from scratch, and we already have Etherne t.
And then bootp and DHCP appeared. These protocols, by the way, are special - like ARP (only they try not to be special, technically being IP packets). They need to be special, because the IP node must be able to send them before it receives the IP address, which of course is impossible, so it just fills the IP headers in essence with nonsense (although indicated in the RFC), so they can be safely discarded . (You will recognize these meaningless headers because DHCP must open the raw socket and fill them in manually; the IP level in the kernel cannot do this.) But no one was eager to invent another protocol that was not IP, so they pretended to this is IP and everyone was happy. Well, as much as possible when you invent DHCP.
I got a little distracted. The distinguishing feature here is the following: unlike the real IP services, the bootp and DHCP protocols need to know about Ethernet addresses, because in the end it is their job to listen to your Ethernet addresses and assign you IP addresses for further work. In fact, this is an appeal of the ARP protocol, except that we cannot say so, because there is already the RARP protocol, which literally is “reverse ARP” (reverse ARP - approx. Transl.) . Actually, RARP worked quite well and did the same thing as bootp and DHCP, being much simpler, but let's not talk about that.
The point of all this is that Ethernet and IP are intertwined more and more. Now they are practically inseparable. It is difficult to imagine a network interface (except ppp0) without a 48-bit MAC address, and it is difficult to imagine this interface working without an IP address. You record your IP routing table using IP addresses, but of course you know that you are lying by calling the router by its IP address; you are just indirectly saying that you want to route through the MAC address. And you have ARP, which goes through bridges, but make-believe, and DHCP, which is the IP protocol, but actually Ethernet, etc.
Moreover, we still have bridging and routing, and they are both getting more complicated, while local networks and the Internet are getting more and more complicated. Bridging is still mostly hardware and is defined by IEEE, the people who manage Ethernet standards. Routing is still mostly software and is defined by the IETF, people who control Internet standards. Both groups are still trying to pretend that there is no other group. Network operators simply choose bridging vs routing based on how fast they want it to work and how much they hate setting up DHCP servers, which they really hate very much, which means they use bridges as much as possible and routing - when they have to.
In fact, the bridges got so out of control that people decided to take the decisions made at the bridge level entirely to a higher level (of course, the exchange of configuration between the bridges is done using the protocol over IP!) So that they can be centrally managed. This is called a software defined network (SDN). This is much better compared to when switches and bridges are allowed to do whatever they want, but it is also fundamentally stupid because you know what a software-defined network is? IP This is it literally, and there has always been an SDN that you use to connect networks that have become too large. But the problem is that IPv4 was initially too difficult to speed up hardware, and in any case, it did not receive hardware acceleration, and the DHCP setting is a hell, so the network operators just learned how to connect bridges with more and more large entities. Now big data centers are simply based on SDN, and you could not use IP in a data center with the same success, because no one routes packets. This is all just one big bus network.
This is, in short, a mess.
Good story, right? Good one. Now, let's pretend that none of this happened, and we returned back to the 1990s, when most of everything actually happened, but the people at the IETF still pretended that this was not the case and the “impending” disaster could be avoided. This is the good part!
I forgot to mention something in this long story above: somewhere in this chain of events we completely stopped using bus networks . Ethernet is actually no longer a bus. He only pretends to be a tire. Simply put, we could not get the famous CSMA / CD to workas the speeds increase, so we went back to the good old star topology. We have a bundle of cables from the switch, so that we can stretch one cable from each station to the center. Walls, ceilings and floors are filled with large, thick and expensive bundles of Ethernet cables, because we could not figure out how to make the bus work well ... at level 1. This is somewhat funny if you think about it. Unless, of course, you find sad things funny.
In fact, in the order of an attack of madness, even WiFi - the ultimate case of a "bus" network - is right! - where literally everyone shares the same open space environment, we almost everywhere use WiFi in a mode called “infrastructure”, which emulates the topology of a giant “star”. If you have two WiFi stations connected to the same access point, they do not communicate with each other directly, even when they can “hear” each other well. They send a packet to the access point, but addressed to the MAC address of another node. The access point then reflects it toward the destination node.
KEEP HORSES GIVE ME FOR YOU TO EXPLAIN IT. There is one catch. When node X wants to send something to the Internet node Z, via IP router Y, via Wi-Fi access point A, what does the packet look like? Let's draw a picture of what we want:
Z is the destination IP address, so obviously the IP destination field should be Z. Y is the router, which, as we learned above, we indicate its Ethernet MAC address in the Ethernet destination field. But on Wi-Fi, X cannot simply send the packet to Y, for various reasons (including the fact that they do not know each other's WPA2 encryption keys). We need to send to A. You may ask, where do we put the address A?
Not a problem! 802.11 has such a thing as three-address mode. They added a thirdEthernet MAC address per frame so that you can talk about a real Ethernet destination and an intermediate Ethernet destination. On top of this, there are also bit fields called “to-AP” and “from-AP” that tell you that the packet is coming from the station to the access point or from the access point to the station, respectively. But actually, they can be both true, because Wi-Fi repeaters do this (the TD sends packets to the TD).
Speaking of repeaters! If A is a repeater, send it back to base station B by following a path that looks like this:
X-> A uses a three-address mode, but on A-> B there is a problem: the source Ethernet is X, and the destination Ethernet is Y, but the packet is sent over the air from A to B; X and Y are not involved at all. Suffice it to say that there is such a thing as a four-address mode, and it works exactly as you might think.
(There is a mode called six-address mode in 802.11s mesh networks, and at about this point I gave up trying to understand.)
Oh oh. This post is a little off track, don't you find?
That is the purpose of this whole story. The people at the IETF, when they came up with IPv6, looked at all this mess - and perhaps predicted even more confusion that should have appeared, although I doubt that they could predict the SDN and WiFi repeater modes - and said, wait a minute wait. We don't need all this shit! What if instead the world around us would work like this:
Imagine that we would live in such a world: WiFi repeaters would be just IPv6 routers. And access points too. And Ethernet switches. And SDN. ARP storms would be over. Each routing problem could be tracerouted. And best of all, we could throw out 12 bytes (source and destination MAC addresses) from each Ethernet packet, and 18 bytes (source / destination / access point) from each WiFi packet. Of course, IPv6 will add us an additional 24 bytes of addresses (compared to IPv4), but you will drop 12 bytes of Ethernet, so that the overhead will be only 12 bytes - comparable to using two 64-bit IP addresses if you leave the Ethernet header. The idea that one day we could throw away Ethernet addresses helped justify the bloat of IPv6 addresses.
That would be beautiful. Except for one problem: this did not happen.
One colleague at work said better than anyone: “layers are always only added, and never disappear.”
For all these miracles, the opportunity to start over and throw away the legacy built by then is needed. And this, unfortunately, is for the most part impossible. Even if IPv6 reached 99% penetration, it would not mean that we got rid of IPv4. And if we did not get rid of IPv4, we did not get rid of Ethernet addresses, or WiFi addresses. And if we need to follow the IEEE 802.3 and 802.11 frame standards, we can never throw those bytes away. Therefore, we will always need the IPv6 neighbor discovery protocol, which is simply more complex ARP. Even though we no longer use bus networks, we will always need some sort of broadcast, because this is how ARP works. We will need to keep the local DHCP server running at home so that our legacy IPv4 bulbs continue to work. We still need NAT
And this is not the worst. Worst of all, we still need an endless abomination in the form of second-level bridging, due to another mistake that the IPv6 team forgot to fix . Unfortunately, when they developed IPv6 in the 1990s, the idea was to start IPv6 first - it should have taken several years - and then work on it when IPv4 and MAC addresses already disappeared, then this task would be easier to solve, but on that moment, no one really had any truly “mobile IP devices”. That is, what would it mean at all - to carry a laptop with you and stick it into the Ethernet ports one by one, while the file is uploaded via FTP? That sounds dumb.
Of course, having a couple of decades of history behind us, we now know several examples of laptop computers - your phone - and connecting it to theEthernet ports of wireless access points, one after the other. We do this all the time. And with LTE, it even basically works! With WiFi, this only works sometimes. Not bad, right?
Not really, because of the shameful secret of the Internet: it all works only because of the second level bridging. Internet routing does not work with mobility, at all. If you navigate the IP network, your IP address changes, and this breaks all the connections you open.
Corporate WiFi networks fool you by combining the entire LAN on the second level with a bridge, so that the giant central DHCP server always gives you the same IP address, regardless of which corporate access point you are connected to, and then delivers your packets to you, with a maximum of by interruption for a few seconds while the bridge is reconfiguring. These newfangled home WiFi systems with multiple repeaters / extenders do the same. But if you switch from one WiFi network to another while walking along the street - if public WiFi were in all stores in a row, then everything is bad. Each of them gives you a new IP address, and every time your IP address changes, all your connections break.
LTE is trying even harder. You keep your IP address (usually IPv6 address in the case of mobile networks), even when traveling for kilometers and numerous cell towers pass you from one to another. How? Well ... they usually just tunnel your traffic to a central point, where it all connects with a bridge (albeit through enhanced filtering by firewalls) into one super-huge second-level virtual network. And your connections continue to live. At the cost of a lot of complexity and a really discouraging amount of additional delays that they really would like to remove, but this is almost impossible.
Well, that was a long story, but I was able to pull it out of people on the IETF. When we got here, to the problem of mobile IP, I could not do anything but ask. Something went wrong? Why can't we make this work?
It turns out that the answer is surprisingly simple. A big flaw lies in how the well-known “four” were defined (source IP, source port, destination IP, destination port). We use this four to identify a given TCP or UDP session; if the packet contains the same four fields, then it belongs to this session, and we can send it to the socket that serves the session. But the four covers two levels: network (third) and transport (fourth). If instead we would define sessions using onlydata of the fourth level, then mobile clients would work perfectly.
Here is a short example. Port 1111 of client X communicates with port 80 of Y, so it must send a four (X, 1111, Y, 80). The answer comes from (Y, 80, X, 1111), and the kernel delivers it to the socket that the first packet created. When X sends more packets designated (X, 1111, Y, 80), Y sends them to the same server socket, etc.
Then X changes the IP address and gets the name, say, Q. Now it starts sending packets with a four (Q, 1111, Y, 80). Y has no idea what that means, and throws it away. In the meantime, if Y sends packets marked (Y, 80, X, 1111), then they will be lost because there is no longer X ready to receive them.
Imagine now that we would mark sockets without reference to IP addresses. For this to work, we need much larger port numbers (which are now 16 bits). Let's make them, say, 128 or 256 bit long, a bit of a unique hash.
Now X sends a packet Y with a label (uuid, 80). Note that the packets themselves still contain information about IP addresses (X, Y), at level 3 - this is how they are routed to the correct machine. But the kernel does not use level 3 information to decide which socket to send the packet to; it just uses uuid. The destination port (80 in this case) is needed only to start a new session, to determine which service you want to connect to, and can be ignored or ignored after that.
For the opposite direction, the Y kernel caches the fact that packets for (uuid) go to IP address X, which is the last address from which packets for (uuid) came.
Now imagine that X changes the address to Q. It still sends packets with the tag (uuid, 80) to IP address Y, but now these packets come from address Q. Machine Y receives this packet and checks it against the socket associated with ( uuid), notices that packets for this socket now come from address Q and updates the cache. Now packets in the opposite direction can be sent, with the tag (uuid), in the direction of Q instead of X. Everything works! (Taking into account the measures necessary to prevent the attacks of impostors).
There is only one catch: UDP and TCP do not work this way, and it is too late to update them. Updating UDP and TCP would be comparable to upgrading IPv4 to IPv6; a project that seemed simple then in the 1990s, but decades later not half and half completed (and the first half was simple; the rest is much more complicated).
The good news is that maybe we can get around this with another violation of the “bundle”. If we throw away TCP - it is still old enough anyway - and instead use QUIC over UDP, then we can just stop using the four UDP as the connection identifier. Instead, if the UDP port number is equal to a certain value, meaning the “mobility layer”, then we will unpack the contents, which may be another packet with the correct UUID tag, check it with the correct session and deliver these packets to the correct socket.
There is even better news: the experimental QUIC protocol already, at least in theory, has the correct package structure to work this way. It turns out that in any case, you need unique session identifiers (keys) if you want to use stateless (stateless) encryption and authentication, as QUIC does. So, perhaps with minor tweaks, QUIC could support transparent roaming. What a world like that would be!
Here, all we need to do is remove all the remnants of UDP and TCP from the Internet, and then we would definitely have lost the need for second-level bridges, this time for real, and then we could get rid of the broadcasts and MAC addresses, and SDN, and DHCP, and everything else.
Then the Internet would become elegant again.
Last November, I went to the IETF for the first time. The IETF is an interesting place: it seems that one third consists of hard accompanying work, one third of the expansion of already created things, and one third of crazy, far from reality studies (in this place Avery used the phrase “blue sky insanity”, formed by him from the expression blue skies research - approx. transl.) . I took part mainly because I wanted to see how people react toTCP BBR, which was first introduced . (Answer: for the most part, positively, but with disbelief. It seemed too good to live up to expectations.)
Be that as it may, the IETF meetings consisted of many presentations about IPv6, which was supposed to replace the IPv4 protocol, which constitutes the foundation of the internet. (Some would say that the replacement is already underway; some that it has already happened.) In addition to these presentations about IPv6, there are a large number of people who consider it the best, greatest of all, and they are sure that it is about to finally arrive (At Any Moment), and IPv4 is just a big bunch of hacks destined to die so that the Internet becomes beautiful again.
I thought it would be a good opportunity to actually try to figure out what was going on. Why is IPv6 so confusing as compared to IPv4? Wouldn't it be better if it was just IPv4 with an increased number of bits in the address ? But no, for heaven’s sake, everything was done wrong. So I started asking everyone around, and here's what I found out.
Tires destroyed everything
Once upon a time there was a telephone network that used physical circuit switching. In essence, this meant moving the connectors in such a way that your phone literally turned out to be connected with a very long wire ( OSI level 1 ). And the “leased line” was the very long wire that you leased from the telephone company. You put bits on one side into this wire, and from the other end they came out after a fixed period of time. You didn't need addresses, because there was only one car at each end.
Once telephone companies have optimized all this a bit. Time division multiplexing (TDM) and “virtual channel switching” have appeared. Telephone companies could transparently take bits at low speed from many lines, group them together using multiplexers and demultiplexers, and send them through the telephone system using fewer wires than before. For this to work, more work was required than before, but so far for us modem users everything was still the same: we put the bits in one direction, they pop out of the other. No addresses needed.
The Internet (then not yet called) was built on top of these channels. You had a bunch of wires in which you can stick bits and catch on the other hand. If one computer has two or three network interfaces, then it can, if properly instructed, send bits from one line to another, and you can do something much more efficient than separate communication lines between each pair of computers. And so there were IP addresses ("level 3"), subnets and routing. Even then, with these point-to-point channels, you did not need MAC addresses, because as soon as the packet was in the wire, there was only one place from where it could exit. You only needed IP addresses to decide where it should go after that.
Meanwhile, as an alternative, local area networks (LANs) were invented. If you wanted to connect your computers (or terminals and the mainframe) at home, you got the inconvenience in the form of many interfaces that you had to have for each connection in the star topology. To reduce the cost of electronics, people needed a bus-type network (also known as a “broadcast domain,” a concept that will be important in the future), where many stations could simply be connected in one wire, and talk to anyone who is connected into it. These were not the same people who built the Internet, so they did not use IP addresses for this. They invented their own scheme ("level 2").
One of the early bus-type local area networks was arcnet, dear to my heart (I wrote the first Linux arcnet driver andarcnet poems in the distant nineties, long after arcnet became obsolete). Arcnet level 2 addresses were very simple: only 8 bits set by jumpers or DIP switches on the back of the network card. It was your task as the owner of the network - to configure the addresses and make sure that you do not have duplicates, or any damn thing could happen otherwise. It was a little painful, but arcnet networks were usually quite small, so it was just a bit of a pain.
A few years later, Ethernet came and solved this problem once and for all, using much more bits (in fact, 48) in the addresses of the second level. This is enough bits so that you can assign different from the others (shard-serial(here, apparently, it means that the first three bytes of the MAC address are assigned as a range to a specific manufacturer - approx. transl.) ) the address of each device that has ever been released, and not have intersections. And that is exactly what they did! This is how Ethernet MAC addresses appeared.
Various LAN technologies came and went, including one of my favorites, IPX ( internetwork - approx. Transl.) Packet exchange, although it had nothing to do with the "real" Internet), and Netware, which worked great until then as long as all the clients and servers were on the network from one bus. You never had to configure any addresses. It was beautiful, reliable and efficient. In practice, the golden age of network building.
Of course, someone had to destroy it: large networks of companies / universities. They wanted to have so many connected computers that sharing 10 Mbit / s on a single bus between them all became a bottleneck, so they needed a way to have many buses and to connect “Internet” to each other, if you like, these buses together. You probably think, “of course! Use the Internet Protocol (IP) for this, ”right? Haha no. The Internet protocol, still not called so, was not yet old enough and popular at that time, and no one took it seriously. Netware-over-IPX (and numerous other LAN protocols at that time) was a serious matter, and as any serious business does, they invented their own things to expand the increasingly popular Ethernet. Ethernet devices already had addresses, MAC addresses, which was probably the only thing that people using various LAN protocols could agree on, so they decided to use Ethernet addresses as keys for their routing mechanisms. (In fact, instead of “routing,” they called it bridging and switching.)
The problem with Ethernet addresses is that they are assigned sequentially at the factory so that they cannot be hierarchical. This means that a “bridging table” is not as good as a modern IP routing table, which can contain a record of a route to an entire subnet immediately. To make bridging efficient, you had to remember on which bus network each MAC address could be found. And people did not want to configure each of them with their hands, so this had to be figured out on their own. If you had an intricate network connection using bridges, things got a little complicated. As far as I understand, this is what led to the poem about the spanning tree , and I will probably just leave it here. Poetry is very important in networking.
Be that as it may, for the most part it worked, although it was a bit confusing, and there were broadcast "floods" here and there, and the routes were not always optimal, and it was almost impossible to debug. (You definitely could not write something like traceroute for bridges, because none of the tools that were needed to make it work - such as the ability to configure an address on an intermediate bridge - do not exist in bare Ethernet.)
On the other hand, all of these bridges have been hardware optimized. Zhelezyachnikami simply invented the whole system as a mechanism that deceives software that had no idea about the many buses and bridges between them so that it would work in larger networks. Hardware bridging means the bridge can work really fast, as fast as Ethernet itself. Now it does not sound like something outstanding, but at that time it was very much. Ethernet was 10 Mbit / s, so you might be able to block it by connecting several computers at once, but you could not give out one 10 Mbit / s computer. In those days, it sounded crazy.
In any case, the point is that bridging was a mess that cannot be debugged, but it was fast.
Bus Internet
While all this was happening, those same Internet users got to work, and, of course, they did not miss the appearance of cool, cheap LAN technologies. I think it could be about the same time that ARPANET was renamed to the Internet, although I'm not so sure. Let's say that it was, because the story sounds better when it is told confidently.
At some point, progress has gone from connecting individual Internet computers via long-distance point-to-point links to the desire to connect entire local networks together through point-to-point connections. In general, I wanted to have “long bridges”.
You might think: “hey, yes there is no problem, why not build a bridge on a long line of communication and end it?” Sounds good, but it doesn't work. I will not go into details, but in short the problem isoverload control (unfortunately, for some reason there is no Russian translation of this article on a wiki - approx. transl.). The terrible dark secret of Ethernet bridging is the assumption that all your connections work at approximately the same speed, and / or are heavily underloaded, because they do not have a braking mechanism. You just spit out the data as fast as you can, and expect it to come. But when your Ethernet runs at 10 Mbps and your point-to-point connectivity at 0.128 Mbps, this is completely hopeless. Another problem is that figuring out routes using sending over all channels to figure out which one is correct - and thus bridging usually works - is too expensive for slow connections. Non-optimal routing, which is annoying in local networks, where low latency and high throughput, on slow and expensive long-distance communication channels is absolutely disgusting. It just doesn't scale.
Fortunately, Internet users (if the Internet was already called that) worked exactly on the same problems. If we could use Internet tools to connect Ethernet buses together, we would be in good shape.
And then they developed a “frame format” for Internet packets over Ethernet (and arcnet at the same time, and all other types of LAN).
And here everything went awry.
The first problem that needed to be solved was that now, when you put the bag into the wire, it became completely unclear which machine should “hear” it and possibly forward it. If several Internet routers are in the same Ethernet segment, you cannot make it so that they all receive the packet and try to redirect it; this is the path towards packet storms and looped routes. No, you need to choose which router on the Ethernet bus should pick it up. We cannot just use the destination IP address field for this, because we already wrote down the address of the message recipient there, and not the address of the router. Instead, we determine the desired router using its MAC address in the Ethernet frame.
Thus, in order to configure your local IP route table, you would like to be able to say something like “send packets to address 10.1.1.1 through a router with MAC 11: 22: 33: 44: 55: 66.” This is what you would like to express. Important! Your packet is assigned an IP address, but your router is a MAC. But if you've ever set up a routing table, you may have noticed that no one writes them that way. Instead, you write: "send packets to 10.1.1.1 through a router on 192.168.1.1."
In fact, this only complicates things. Now your operating system should first find the MAC address for 192.168.1.1, understand that it is 11: 22: 33: 44: 55: 66, and finally collect the packet with the Ethernet destination address 11: 22: 33: 44: 55: 66 and destination address IP 10.1.1.1. The address 192.168.1.1 is not specified anywhere in the package, it is just an abstraction for people.
To make this useless intermediate step, you need to add ARP (Address Resolution Protocol), a simple non-IP protocol whose task is to convert an IP address to an Ethernet address. This is done by broadcasting a request to everyone in the local Ethernet segment, asking if they have this IP address. If you have bridges, they should forward all ARP packets to all of their interfaces, because they are broadcast packets, which is exactly what the word broadcasting means. In a large, busy Ethernet network with many connected LANs, redundant broadcasts become one of your nightmares. This is especially bad on WiFi networks. Over time, in order to deal with this problem, people came up with bridges / switches with special hacks to avoid ARP forwarding while it is technically possible. Some devices (especially Wi-Fi hotspots) simply respond with fake ARP responses to help. But this is all crutches, although sometimes necessary.
Heritage Death
Time passed. Once (and actually it took a decent amount of time), people almost stopped using non-IP protocols in Ethernet. So basically all the networks became a physical wire (level 1), with many stations on the bus (level 2), the buses are connected using bridges (got caught! Still level 2), and these inter-buses are connected by IP routers (level 3) )
Some time later, people got tired of manually setting IP addresses in the arcnet style, and wanted them to configure themselves in the Ethernet style, well, except that it was too late to do this in the Ethernet style, because a) the devices were already released with Ethernet addresses, not IP, b) IP addresses were only 32-bit, which is not enough to just produce them endlessly without intersections, and c) a simple sequential assignment of IP addresses instead of using subnets would bring us back to the beginning: it would be another Ethernet made from scratch, and we already have Etherne t.
And then bootp and DHCP appeared. These protocols, by the way, are special - like ARP (only they try not to be special, technically being IP packets). They need to be special, because the IP node must be able to send them before it receives the IP address, which of course is impossible, so it just fills the IP headers in essence with nonsense (although indicated in the RFC), so they can be safely discarded . (You will recognize these meaningless headers because DHCP must open the raw socket and fill them in manually; the IP level in the kernel cannot do this.) But no one was eager to invent another protocol that was not IP, so they pretended to this is IP and everyone was happy. Well, as much as possible when you invent DHCP.
I got a little distracted. The distinguishing feature here is the following: unlike the real IP services, the bootp and DHCP protocols need to know about Ethernet addresses, because in the end it is their job to listen to your Ethernet addresses and assign you IP addresses for further work. In fact, this is an appeal of the ARP protocol, except that we cannot say so, because there is already the RARP protocol, which literally is “reverse ARP” (reverse ARP - approx. Transl.) . Actually, RARP worked quite well and did the same thing as bootp and DHCP, being much simpler, but let's not talk about that.
The point of all this is that Ethernet and IP are intertwined more and more. Now they are practically inseparable. It is difficult to imagine a network interface (except ppp0) without a 48-bit MAC address, and it is difficult to imagine this interface working without an IP address. You record your IP routing table using IP addresses, but of course you know that you are lying by calling the router by its IP address; you are just indirectly saying that you want to route through the MAC address. And you have ARP, which goes through bridges, but make-believe, and DHCP, which is the IP protocol, but actually Ethernet, etc.
Moreover, we still have bridging and routing, and they are both getting more complicated, while local networks and the Internet are getting more and more complicated. Bridging is still mostly hardware and is defined by IEEE, the people who manage Ethernet standards. Routing is still mostly software and is defined by the IETF, people who control Internet standards. Both groups are still trying to pretend that there is no other group. Network operators simply choose bridging vs routing based on how fast they want it to work and how much they hate setting up DHCP servers, which they really hate very much, which means they use bridges as much as possible and routing - when they have to.
In fact, the bridges got so out of control that people decided to take the decisions made at the bridge level entirely to a higher level (of course, the exchange of configuration between the bridges is done using the protocol over IP!) So that they can be centrally managed. This is called a software defined network (SDN). This is much better compared to when switches and bridges are allowed to do whatever they want, but it is also fundamentally stupid because you know what a software-defined network is? IP This is it literally, and there has always been an SDN that you use to connect networks that have become too large. But the problem is that IPv4 was initially too difficult to speed up hardware, and in any case, it did not receive hardware acceleration, and the DHCP setting is a hell, so the network operators just learned how to connect bridges with more and more large entities. Now big data centers are simply based on SDN, and you could not use IP in a data center with the same success, because no one routes packets. This is all just one big bus network.
This is, in short, a mess.
Now forget that I told all this ...
Good story, right? Good one. Now, let's pretend that none of this happened, and we returned back to the 1990s, when most of everything actually happened, but the people at the IETF still pretended that this was not the case and the “impending” disaster could be avoided. This is the good part!
I forgot to mention something in this long story above: somewhere in this chain of events we completely stopped using bus networks . Ethernet is actually no longer a bus. He only pretends to be a tire. Simply put, we could not get the famous CSMA / CD to workas the speeds increase, so we went back to the good old star topology. We have a bundle of cables from the switch, so that we can stretch one cable from each station to the center. Walls, ceilings and floors are filled with large, thick and expensive bundles of Ethernet cables, because we could not figure out how to make the bus work well ... at level 1. This is somewhat funny if you think about it. Unless, of course, you find sad things funny.
In fact, in the order of an attack of madness, even WiFi - the ultimate case of a "bus" network - is right! - where literally everyone shares the same open space environment, we almost everywhere use WiFi in a mode called “infrastructure”, which emulates the topology of a giant “star”. If you have two WiFi stations connected to the same access point, they do not communicate with each other directly, even when they can “hear” each other well. They send a packet to the access point, but addressed to the MAC address of another node. The access point then reflects it toward the destination node.
KEEP HORSES GIVE ME FOR YOU TO EXPLAIN IT. There is one catch. When node X wants to send something to the Internet node Z, via IP router Y, via Wi-Fi access point A, what does the packet look like? Let's draw a picture of what we want:
X -> [wifi] -> A -> [wifi] -> Y -> [internet] -> Z
Z is the destination IP address, so obviously the IP destination field should be Z. Y is the router, which, as we learned above, we indicate its Ethernet MAC address in the Ethernet destination field. But on Wi-Fi, X cannot simply send the packet to Y, for various reasons (including the fact that they do not know each other's WPA2 encryption keys). We need to send to A. You may ask, where do we put the address A?
Not a problem! 802.11 has such a thing as three-address mode. They added a thirdEthernet MAC address per frame so that you can talk about a real Ethernet destination and an intermediate Ethernet destination. On top of this, there are also bit fields called “to-AP” and “from-AP” that tell you that the packet is coming from the station to the access point or from the access point to the station, respectively. But actually, they can be both true, because Wi-Fi repeaters do this (the TD sends packets to the TD).
Speaking of repeaters! If A is a repeater, send it back to base station B by following a path that looks like this:
X -> [wifi] -> A -> [wifi-repeater] -> B -> [wifi] -> Y -> [internet] -> Z
X-> A uses a three-address mode, but on A-> B there is a problem: the source Ethernet is X, and the destination Ethernet is Y, but the packet is sent over the air from A to B; X and Y are not involved at all. Suffice it to say that there is such a thing as a four-address mode, and it works exactly as you might think.
(There is a mode called six-address mode in 802.11s mesh networks, and at about this point I gave up trying to understand.)
Avery, they promised me IPv6, and you haven't even mentioned it yet.
Oh oh. This post is a little off track, don't you find?
That is the purpose of this whole story. The people at the IETF, when they came up with IPv6, looked at all this mess - and perhaps predicted even more confusion that should have appeared, although I doubt that they could predict the SDN and WiFi repeater modes - and said, wait a minute wait. We don't need all this shit! What if instead the world around us would work like this:
- No more physical bus networks (ready!)
- No more intranets of level 2 (there is a third level for this)
- No more Broadcasts (level two will always be point-to-point, so where would you send the Broadcast? Replace with multicast - multicast)
- No more MAC addresses (in point-to-point networks it’s obvious who the sender is and who the recipient is, and you can do multicast on IP addresses)
- No more ARP and DHCP (no MAC addresses, so there is no mapping of IP addresses to MAC)
- No more complexities with IP headers (so you can speed up routing in hardware)
- No more IP address shortages (so we can go back to routing large subnets)
- No more manual configuration of IP addresses except the kernel (Internet - approx. Transl.) (And we have so many IP addresses that we can recursively distribute subnets from the tree from there)
Imagine that we would live in such a world: WiFi repeaters would be just IPv6 routers. And access points too. And Ethernet switches. And SDN. ARP storms would be over. Each routing problem could be tracerouted. And best of all, we could throw out 12 bytes (source and destination MAC addresses) from each Ethernet packet, and 18 bytes (source / destination / access point) from each WiFi packet. Of course, IPv6 will add us an additional 24 bytes of addresses (compared to IPv4), but you will drop 12 bytes of Ethernet, so that the overhead will be only 12 bytes - comparable to using two 64-bit IP addresses if you leave the Ethernet header. The idea that one day we could throw away Ethernet addresses helped justify the bloat of IPv6 addresses.
That would be beautiful. Except for one problem: this did not happen.
Requiem for a Dream
One colleague at work said better than anyone: “layers are always only added, and never disappear.”
For all these miracles, the opportunity to start over and throw away the legacy built by then is needed. And this, unfortunately, is for the most part impossible. Even if IPv6 reached 99% penetration, it would not mean that we got rid of IPv4. And if we did not get rid of IPv4, we did not get rid of Ethernet addresses, or WiFi addresses. And if we need to follow the IEEE 802.3 and 802.11 frame standards, we can never throw those bytes away. Therefore, we will always need the IPv6 neighbor discovery protocol, which is simply more complex ARP. Even though we no longer use bus networks, we will always need some sort of broadcast, because this is how ARP works. We will need to keep the local DHCP server running at home so that our legacy IPv4 bulbs continue to work. We still need NAT
And this is not the worst. Worst of all, we still need an endless abomination in the form of second-level bridging, due to another mistake that the IPv6 team forgot to fix . Unfortunately, when they developed IPv6 in the 1990s, the idea was to start IPv6 first - it should have taken several years - and then work on it when IPv4 and MAC addresses already disappeared, then this task would be easier to solve, but on that moment, no one really had any truly “mobile IP devices”. That is, what would it mean at all - to carry a laptop with you and stick it into the Ethernet ports one by one, while the file is uploaded via FTP? That sounds dumb.
Killer App: Mobile IP
Of course, having a couple of decades of history behind us, we now know several examples of laptop computers - your phone - and connecting it to the
Not really, because of the shameful secret of the Internet: it all works only because of the second level bridging. Internet routing does not work with mobility, at all. If you navigate the IP network, your IP address changes, and this breaks all the connections you open.
Corporate WiFi networks fool you by combining the entire LAN on the second level with a bridge, so that the giant central DHCP server always gives you the same IP address, regardless of which corporate access point you are connected to, and then delivers your packets to you, with a maximum of by interruption for a few seconds while the bridge is reconfiguring. These newfangled home WiFi systems with multiple repeaters / extenders do the same. But if you switch from one WiFi network to another while walking along the street - if public WiFi were in all stores in a row, then everything is bad. Each of them gives you a new IP address, and every time your IP address changes, all your connections break.
LTE is trying even harder. You keep your IP address (usually IPv6 address in the case of mobile networks), even when traveling for kilometers and numerous cell towers pass you from one to another. How? Well ... they usually just tunnel your traffic to a central point, where it all connects with a bridge (albeit through enhanced filtering by firewalls) into one super-huge second-level virtual network. And your connections continue to live. At the cost of a lot of complexity and a really discouraging amount of additional delays that they really would like to remove, but this is almost impossible.
How to make mobile networks work
Footnote 1
it turns out that nothing in this section requires IPv6. Everything would work with IPv4 through NAT, even roaming through several NATs.
Well, that was a long story, but I was able to pull it out of people on the IETF. When we got here, to the problem of mobile IP, I could not do anything but ask. Something went wrong? Why can't we make this work?
It turns out that the answer is surprisingly simple. A big flaw lies in how the well-known “four” were defined (source IP, source port, destination IP, destination port). We use this four to identify a given TCP or UDP session; if the packet contains the same four fields, then it belongs to this session, and we can send it to the socket that serves the session. But the four covers two levels: network (third) and transport (fourth). If instead we would define sessions using onlydata of the fourth level, then mobile clients would work perfectly.
Here is a short example. Port 1111 of client X communicates with port 80 of Y, so it must send a four (X, 1111, Y, 80). The answer comes from (Y, 80, X, 1111), and the kernel delivers it to the socket that the first packet created. When X sends more packets designated (X, 1111, Y, 80), Y sends them to the same server socket, etc.
Then X changes the IP address and gets the name, say, Q. Now it starts sending packets with a four (Q, 1111, Y, 80). Y has no idea what that means, and throws it away. In the meantime, if Y sends packets marked (Y, 80, X, 1111), then they will be lost because there is no longer X ready to receive them.
Imagine now that we would mark sockets without reference to IP addresses. For this to work, we need much larger port numbers (which are now 16 bits). Let's make them, say, 128 or 256 bit long, a bit of a unique hash.
Now X sends a packet Y with a label (uuid, 80). Note that the packets themselves still contain information about IP addresses (X, Y), at level 3 - this is how they are routed to the correct machine. But the kernel does not use level 3 information to decide which socket to send the packet to; it just uses uuid. The destination port (80 in this case) is needed only to start a new session, to determine which service you want to connect to, and can be ignored or ignored after that.
For the opposite direction, the Y kernel caches the fact that packets for (uuid) go to IP address X, which is the last address from which packets for (uuid) came.
Now imagine that X changes the address to Q. It still sends packets with the tag (uuid, 80) to IP address Y, but now these packets come from address Q. Machine Y receives this packet and checks it against the socket associated with ( uuid), notices that packets for this socket now come from address Q and updates the cache. Now packets in the opposite direction can be sent, with the tag (uuid), in the direction of Q instead of X. Everything works! (Taking into account the measures necessary to prevent the attacks of impostors).
Footnote 2
Some asked what such “measures to prevent attacks on connections” might look like. There are various ways to achieve this, but the easiest way is to do something like the SYN-ACK-SYNACK exchange, which is done when TCP starts. If Y simply trusts the first packet from host Q, then it is too easy for an attacker to intercept the X-> Y connection by sending a packet to Y from anywhere on the Internet. (Although it’s a little difficult to guess which 256-bit uuid should be substituted). But if Y sends back the cookie that Q needs to receive, process and send back, then this will prove that Q is at least a man in the middle, and not just an external attacker (in any case, TCP does not guarantee you more either). If you use an encrypted protocol (such as QUIC), then the handshake can also be protected by a session key.
There is only one catch: UDP and TCP do not work this way, and it is too late to update them. Updating UDP and TCP would be comparable to upgrading IPv4 to IPv6; a project that seemed simple then in the 1990s, but decades later not half and half completed (and the first half was simple; the rest is much more complicated).
The good news is that maybe we can get around this with another violation of the “bundle”. If we throw away TCP - it is still old enough anyway - and instead use QUIC over UDP, then we can just stop using the four UDP as the connection identifier. Instead, if the UDP port number is equal to a certain value, meaning the “mobility layer”, then we will unpack the contents, which may be another packet with the correct UUID tag, check it with the correct session and deliver these packets to the correct socket.
There is even better news: the experimental QUIC protocol already, at least in theory, has the correct package structure to work this way. It turns out that in any case, you need unique session identifiers (keys) if you want to use stateless (stateless) encryption and authentication, as QUIC does. So, perhaps with minor tweaks, QUIC could support transparent roaming. What a world like that would be!
Here, all we need to do is remove all the remnants of UDP and TCP from the Internet, and then we would definitely have lost the need for second-level bridges, this time for real, and then we could get rid of the broadcasts and MAC addresses, and SDN, and DHCP, and everything else.
Then the Internet would become elegant again.