How many .com domain names are not used?
When searching for free names in the .com zone, I was unpleasantly surprised by the number of already occupied but unused domains. Apparently, all the pronounced combinations of letters in all major languages ​​of the world are recorded. And even unspeakable short combinations. Whether there is a large domain market, do I just come to mind the same names as everyone else? Let's look at the bare statistics ...

Currently there are 137 million .com domain names registered. According to Verisign , there are 137,756,106 .com domains in the “active zone” as of 01/27/2019. Before that, I checked the correctness of the digit with the DNS zone file.

Of these, about one third are used (businesses, personal websites, email, etc.). Another third, apparently, is not used, and the last third is used for various speculative purposes.

Here's how domains are used (in a sample of 2188 pieces):

How did I get these numbers

I started cracking with a random sample of top-level domains from the zone's DNS file (the file was downloaded on 01/21/2019, and the cradle continued until 01/23/2019) until I reached 100,000 valid domains (not all of the entries are valid, some act as a hander for catching people who illegally distribute zone files, and about 1% are name servers, after their exclusion, 98,854 valid domains are left).

For each domain, I collected the following:

  • WHOIS record;
  • all DNS records for top-level domains and subdomains www(by DNS-query ANYdirectly to the name servers specified in the WHOIS-record);
  • ответы HTTP и HTTPS (код состояния, заголовки и тела) для главной страницы домена верхнего уровня и поддомена www (невалидные SSL-сертификаты относили домен в категорию Error);
  • скриншот главной страницы в Mozilla Firefox 64.0 под Linux.

The scan took just over 48 hours from a single server in the Singapore data center. Then I launched the second stage of crawling for all domains that could not connect via HTTP or HTTPS (in case of temporary errors). And finally, for 2188 domains from the sample, I manually checked all the errors in case the crawler timed out or the DOM events were blocked in JavaScript.

Then I wrote a helper script to speed up the manual classification of sites based on their screenshot and content.

The script represents possible categories as a list of buttons with default content.

Using this script, I categorized the sites in two days. Not all sites had to be distinguished manually: in some cases, the category was obvious in the <title> field, so I applied regular expressions. In other cases, the screenshot was not enough, so I had to manually open the domain in the browser for verification.

Summary statistics and conclusions

Top-10 .com registrars from a sample of 100,000 domains

  • GoDaddy registered a third of all domain names. This is approximately 45 million domains. Of them on every third parking pages. In other words, more than 10% of all .com domains on the Internet place GoDaddy ads.
  • Although in a sample of 1851 registrars, they are controlled by a small number of operators. For example, only controls more than a thousand registrars: 1000 LLC, 1001 LLC, 1002, and so on; similar registers with numbers are used by other registrars, but for some not so obvious schemes.
  • Over the past year, 25% of domains have been registered.

Age of domains from a sample of 100,000 pieces (in years)

Domain Categories

The list of categories was supplemented as work progressed. For example, I did not expect a large number of domains for gambling (under aliases).

For most categories, a random selection of screenshots.

Content (31% or ~ 43 million)

Content is a domain with any unique content. This is the default category, where I put any sites in case of doubt.

Advertising (23% or ~ 31 million)

Please note that half of the domains in this category are GoDaddy parking pages on which GoDaddy places Google ads on keywords related to the domain name.

No web server (11% or ~ 16 million)

If I could not connect or get a valid response on port 80 or 443 for the top-level domain or www subdomain, and the domain does not have an MX record, I placed it in this category. Some of these domains are probably used in some other way, for example, as FTP or game servers, but it seems to me that such a minority. Any other IPv6 sites got here, because the crawler server was configured only for IPv4.

Empty (9.2% or ~ 13 million)

An empty domain is one for which the web server responds to requests, but returns blank pages, 404 errors, or blank templates (for example, the default WordPress installation).

The difference between the empty and parked domain is that the empty domain is presumably configured by the user, but the content has not yet been added.

For sale (7.1% or ~ 9.8 million)

Many domains are offered for sale through various brokers and trading platforms. Nearly half of them seem to sell HugeDomains, although their website only talks about “more than 200,000” domains available for purchase. I only considered domains from known sites or when contact details were not included in the ad, because ad networks and brokers often lie that they represent the domain owner (instead, I classified all such domains as ads).

Error (5.7% or ~ 7.9 million)

If the domain returned an error of any type, be it an HTTP error or an error on the page, I attributed it to this category.

Please note that some private domains could accidentally get here if they used regular authentication, since I did not distinguish 403 Forbidden (due to the lack of basic credentials for authentication) from other errors.

Parked (4.8% or ~ 6.5 million)

Parked domains display the registrar page or report that the domain is not yet configured. To fall into this category, the domain must issue a page without external advertising. He can advertise his own services, but can not place ads from the advertising network.

Gambling (3.0% or ~ 4 million)

Almost all sites in this category are in Chinese and work under aliases: often they are short strings of numbers or consonants (for example, 17770012 or tdwhtr). They follow common patterns and contain similar images, often with automatically generated logos. I guess their goal is to attract people for good luck.

Mail (2.6% or ~ 3.5 million)

If the domain did not fall into any category, but it has an MX record in the DNS (for email), I assigned it to the “Mail” category. Did not check if the mail server or delivery is working. It is possible that many of these domains are not used for email.

Redirect (1.1% or ~ 1.6 million)

These include “vanity domains”, which are sent to Facebook pages, alternative company names, etc.

Private (0.64% or ~ 0.9 million)

These are sites where no content is available without authorization (or, in some cases, registration).

Porn (0.59% or ~ 0.8 million)

Like gambling sites, many porn sites work under different aliases. Web sites are predominantly in Chinese, and domains follow similar naming patterns. Since many sites display pornographic material directly (without warning), I did not take screenshots.

