Mapping the Internet with Hilbert Curves

Original author: Ben Cox
  • Transfer
The internet is great. Very great. You just won’t believe how breathtaking it is. I mean, it might seem to you that the range of / 22 blocks that you got as a local Internet registrar (LIR) is a lot - but on the scale of the whole Internet, it’s nuts.

Of course, in fact, it turned out to be not so big - it was not just that we needed IPv6. However, this is another story.

The fact is that IPv4 (the most widely used version of the IP protocol) sets the address limit to 2³². This means that you have approximately 4.2 billion IP addresses with which you can work - although in truth this is not entirely true, since large sections are not available for use:
IP rangeApplication
0.0.0.0/8Local system
10.0.0.0/8Local LAN
127.0.0.0/8Loopback
169.254.0.0/16Link Local
172.16.0.0/12Local LAN
224.0.0.0/4Multicast
240.0.0.0/4“For future reference”
The address ranges (shown as a record using classless addressing, CIDR ) listed above are "removed" for us - and these are 588 316 672 addresses, or about 13% of the total number of addresses.

However, given that we still have 3,706,650,624 addresses, this seems to be not so much, and is ideally reachable for sending a packet to each of them.

So ... This is certainly not the first time someone is trying to do this - the Internet has enough “background noise” (unsolicited packets), most of them are created by systems that try to crack other systems.

image

Here we can see that port 23 is much higher (on a logarithmic scale) than all other ports - and this is the telnet port, which is often used in unprotected routers and other IoT devices.

Knowing this, I accelerated and sent an ICMP ping to each host on the Internet to see how most of the Internet would respond to this ping (and show me if there was a computer connected to the network on the other side).

A day later, I sent 3.7 billion packets and received a tremendous text file. Now we just have to find a way to draw this map!

Meet the Hilbert Curves


The problem with displaying IP addresses is that they are one-dimensional, changing in the direction of increasing or decreasing, and people are not so good at perceiving a large number of one-dimensional points. Therefore, we need to find a way to present them in such a way that we can fill the two-dimensional space with them, which will also help us get more useful graphs.

Fortunately, mathematics is in a hurry to help us - this time in the form of Peano's parametric curves ( space filling curves ):

image

For me, it never worked out how to use this, until I numbered the nodes through which the curve passes.

image

It took me even more time before I realized that we could again display the same animation in one dimension, “untangling” it:

image

In general, now that we have figured out how these graphs work, we can apply them to IP addresses.

Fortunately, there are tools that allow you to build such graphs based on the collected data about IP addresses, so we can only “feed” one of them our data and wait for the result:

cat ping.txt | pcregrep -o1 ': (\d+\.\d+\.\d+\.\d+)' | ./ipv4-heatmap -a ./labels/iana/iana-labels.txt -o out.png

This command will draw the Hilbert curve using a gradient, showing how many systems are online in those / 24.

And so, let me introduce you to the Internet IPv4 map as of April 16, 2018: you can click on the image and open the uncompressed version in full resolution - just keep in mind that it weighs 9 MB. The last public scan that I know of was done in 2012 by the Carna botnet with a size of 420 thousand devices. Using the data obtained by the botnet, we can clearly see some changes.

image





image

In 2012, RIPE has not even touched 185.0.0.0/8, later it will become the range that they will use for the latest distributions, and will give only / 22 of the IP space to each new RIPE member. Because of this, the range 185.0.0.0/8 looks strange against the background of other ranges and there are no mass allocations in it, so it looks very “fit” against the background of all the others.

RIPEs are not the only ones who have fully used ranges over time. Below we see three other different Internet Registrars (RIRs) that have consumed their ranges over the past 6 years:

image

In addition to all this, I also scanned several IP ranges at the APNIC ( Asia-Pacific Network Information Center) every 30 minutes for 24 hours. The data I obtained from this experiment allows you to see how the Internet “breathes” as customers go online in the morning and go offline at night:

image

The most interesting thing in this “gif” is what the dynamic IP pool from ISP looks like showing customers going online for a short period of time and then joining and getting a new IP address (that’s why more active IP addresses “move” during the day):
image

Oh yes, and if you are wondering what IPv6 looks like in this format and how many addresses we already use, then here you are yuchitelny schedule:

image

Also popular now: