
Typos bring Google $ 500 million a year
It's very simple: the so-called typoskvotter register domains with "typos" to collect random traffic, and place contextual advertising there, usually Google AdWords. At the Financial Cryptography and Data Security conference, Harvard experts published their study (PDF), in which they tried to estimate the size of the market. The authors also make the assumption that Google provides technical assistance to domainers and shares profit with them.
According to their estimates, there are at least 938,000 domains on the Web, which are erroneous spellings for 3264 of the largest sites in the .com zone (at least five-letter ones were taken into account). Each popular site has an average of 281 typos. “Erroneous” is about 1.16% of all Internet in the .com zone.
A bit about the research methodology. Typos were generated according to the Damero-Levenshtein model , that is, each replacement of a letter, the absence of a letter, an extra letter or a change of letters in places is considered to be a new word one step away from the original. For the study, a list of domains was generated a stone's throw from the originals. Plus, characteristic network typos were added (for example, the letters www at the beginning of the name of each site, etc.). For the 3264 largest sites, 1,910,738 candidates were obtained. Then, a random sample of 2195 sites was compiled, which the researchers checked manually to determine the percentage of confidence. Based on the results of the audit, the estimate of the number of Typosquatter domains was reduced to 937 918.
As part of the study, a crawler was launched, which bypassed 284,914 domains from the list, which are supposed to be typosquatters. It turned out that contextual advertising was placed on 80% of available sites, and a redirect was on the remaining 20%.

A large percentage of blocking is due to the fact that tens of thousands of Typo-squatter domains are hosted on some servers, so crawler access was blocked as part of the usual protection against DDoS attacks. The vast majority of them then normally open from other IP addresses. As for the “unclassified” domains, these are mainly sites using JavaScript, which the crawler cannot process normally.
What contextual advertising is placed on typoskvotter domains? 36% is contextual advertising of the original site with the correct spelling. The bulk of the rest is links to its competitors.
1250 Google affiliate program identifiers that advertise on these domains have also been identified. Identifiers can be seen in the URL after the “client =” parameter. So, it turned out that some of these identifiers are more common than usual.

The five largest Google partners cover 63% of the market, and the top 10 cover 76% of the market.
Among the affiliate programs, the most popular are Commission Junction (905 domains from the sample) LinkShare (652) and Performics (Google Affliate Network, 290).
As for the redirect, 75 legitimate websites that collect traffic from taipskotter domains have been identified. For example, Pict.com's image hosting service receives traffic from 128 domains where competitor names are mistakenly written. Or the well-known Bet365.com casino collects traffic from domains that mistakenly spell the name of a competitor Sportsbook (saportsbook.com, sxportsbook.com, and another 326 options).
via New Scientist
According to their estimates, there are at least 938,000 domains on the Web, which are erroneous spellings for 3264 of the largest sites in the .com zone (at least five-letter ones were taken into account). Each popular site has an average of 281 typos. “Erroneous” is about 1.16% of all Internet in the .com zone.
A bit about the research methodology. Typos were generated according to the Damero-Levenshtein model , that is, each replacement of a letter, the absence of a letter, an extra letter or a change of letters in places is considered to be a new word one step away from the original. For the study, a list of domains was generated a stone's throw from the originals. Plus, characteristic network typos were added (for example, the letters www at the beginning of the name of each site, etc.). For the 3264 largest sites, 1,910,738 candidates were obtained. Then, a random sample of 2195 sites was compiled, which the researchers checked manually to determine the percentage of confidence. Based on the results of the audit, the estimate of the number of Typosquatter domains was reduced to 937 918.
As part of the study, a crawler was launched, which bypassed 284,914 domains from the list, which are supposed to be typosquatters. It turned out that contextual advertising was placed on 80% of available sites, and a redirect was on the remaining 20%.

A large percentage of blocking is due to the fact that tens of thousands of Typo-squatter domains are hosted on some servers, so crawler access was blocked as part of the usual protection against DDoS attacks. The vast majority of them then normally open from other IP addresses. As for the “unclassified” domains, these are mainly sites using JavaScript, which the crawler cannot process normally.
What contextual advertising is placed on typoskvotter domains? 36% is contextual advertising of the original site with the correct spelling. The bulk of the rest is links to its competitors.
1250 Google affiliate program identifiers that advertise on these domains have also been identified. Identifiers can be seen in the URL after the “client =” parameter. So, it turned out that some of these identifiers are more common than usual.

The five largest Google partners cover 63% of the market, and the top 10 cover 76% of the market.
Among the affiliate programs, the most popular are Commission Junction (905 domains from the sample) LinkShare (652) and Performics (Google Affliate Network, 290).
As for the redirect, 75 legitimate websites that collect traffic from taipskotter domains have been identified. For example, Pict.com's image hosting service receives traffic from 128 domains where competitor names are mistakenly written. Or the well-known Bet365.com casino collects traffic from domains that mistakenly spell the name of a competitor Sportsbook (saportsbook.com, sxportsbook.com, and another 326 options).
via New Scientist