Mapping the Internet with Hilbert Curves

Original author: Ben Cox
  • Transfer
The internet is great. Very great. You just won’t believe how breathtaking it is. I mean, it might seem to you that the range of / 22 blocks that you got as a local Internet registrar (LIR) is a lot - but on the scale of the whole Internet, it’s nuts.

Of course, in fact, it turned out to be not so big - it was not just that we needed IPv6. However, this is another story.

The fact is that IPv4 (the most widely used version of the IP protocol) sets the address limit to 2³². This means that you have approximately 4.2 billion IP addresses with which you can work - although in truth this is not entirely true, since large sections are not available for use:
IP rangeApplication system LAN Local LAN“For future reference”
The address ranges (shown as a record using classless addressing, CIDR ) listed above are "removed" for us - and these are 588 316 672 addresses, or about 13% of the total number of addresses.

However, given that we still have 3,706,650,624 addresses, this seems to be not so much, and is ideally reachable for sending a packet to each of them.

So ... This is certainly not the first time someone is trying to do this - the Internet has enough “background noise” (unsolicited packets), most of them are created by systems that try to crack other systems.


Here we can see that port 23 is much higher (on a logarithmic scale) than all other ports - and this is the telnet port, which is often used in unprotected routers and other IoT devices.

Knowing this, I accelerated and sent an ICMP ping to each host on the Internet to see how most of the Internet would respond to this ping (and show me if there was a computer connected to the network on the other side).

A day later, I sent 3.7 billion packets and received a tremendous text file. Now we just have to find a way to draw this map!

Meet the Hilbert Curves

The problem with displaying IP addresses is that they are one-dimensional, changing in the direction of increasing or decreasing, and people are not so good at perceiving a large number of one-dimensional points. Therefore, we need to find a way to present them in such a way that we can fill the two-dimensional space with them, which will also help us get more useful graphs.

Fortunately, mathematics is in a hurry to help us - this time in the form of Peano's parametric curves ( space filling curves ):


For me, it never worked out how to use this, until I numbered the nodes through which the curve passes.


It took me even more time before I realized that we could again display the same animation in one dimension, “untangling” it:


In general, now that we have figured out how these graphs work, we can apply them to IP addresses.

Fortunately, there are tools that allow you to build such graphs based on the collected data about IP addresses, so we can only “feed” one of them our data and wait for the result:

cat ping.txt | pcregrep -o1 ': (\d+\.\d+\.\d+\.\d+)' | ./ipv4-heatmap -a ./labels/iana/iana-labels.txt -o out.png

This command will draw the Hilbert curve using a gradient, showing how many systems are online in those / 24.

And so, let me introduce you to the Internet IPv4 map as of April 16, 2018: you can click on the image and open the uncompressed version in full resolution - just keep in mind that it weighs 9 MB. The last public scan that I know of was done in 2012 by the Carna botnet with a size of 420 thousand devices. Using the data obtained by the botnet, we can clearly see some changes.



In 2012, RIPE has not even touched, later it will become the range that they will use for the latest distributions, and will give only / 22 of the IP space to each new RIPE member. Because of this, the range looks strange against the background of other ranges and there are no mass allocations in it, so it looks very “fit” against the background of all the others.

RIPEs are not the only ones who have fully used ranges over time. Below we see three other different Internet Registrars (RIRs) that have consumed their ranges over the past 6 years:


In addition to all this, I also scanned several IP ranges at the APNIC ( Asia-Pacific Network Information Center) every 30 minutes for 24 hours. The data I obtained from this experiment allows you to see how the Internet “breathes” as customers go online in the morning and go offline at night:


The most interesting thing in this “gif” is what the dynamic IP pool from ISP looks like showing customers going online for a short period of time and then joining and getting a new IP address (that’s why more active IP addresses “move” during the day):

Oh yes, and if you are wondering what IPv6 looks like in this format and how many addresses we already use, then here you are yuchitelny schedule:


Also popular now: