OpenStack, Docker, and the web terminal, or how we do interactive exercises for learning Linux

    In the article about the online course “Introduction to Linux” on the Stepic educational platform, we promised to talk about the technical implementation of the new type of interactive tasks that was first applied in this course. This type of task allows you to create on-the-fly virtual servers with Linux to work through a web terminal directly in a browser window. An automatic verification system monitors the correctness of the tasks.

    Sample assignment from the course :



    In this article I want to talk about a project that formed the basis of a new type of assignments on Stepic. I will also talk about what components the system consists of, and how they interact with each other, how and where remote servers are created, how the web terminal and the automatic verification system work.

    Inspiration


    I am one of those many who, when searching for a job, does not like to write a resume and write dozens of motivation letters to IT companies in order to break through the filter of HR specialists and finally receive a cherished invitation for an interview from them. It is much more natural, instead of writing laudable words about yourself, to show your real experience and skills on the example of solving daily problems.

    Back in 2012, when I was a student at Matmekh SPbU, I was inspired by the InterviewStreet project , which later evolved into the HackerRank project. The guys developed a platform on which IT companies can conduct online programming competitions. Based on the results of such competitions, companies invite the best participants to pass an interview. In addition to recruiting, the goal of the HackerRank project was to create a site on which anyone can develop programming skills by solving problems from different areas of Computer Science, such as algorithm theory, artificial intelligence, machine learning and others. Already then there were a large number of other platforms on which competitions were held for programmers. Interactive online learning programming platforms such as Codecademy and Code School are actively gaining in popularity.. By that time, I had sufficient experience working as a Linux system administrator and wanted to see similar resources to simplify the process of hiring system engineers, hosting Linux administration competitions, and resources for training in system administration by solving real problems in this area.

    After a hard search from similar projects, only LinuxZoo was found , designed for academic purposes at the University of Edinburgh. Napier (Edinburgh Napier University). I also caught the eye of the promotional video of the very successful and ambitious Coderloop project, already abandoned after the purchaseby Gild. In this video, I saw exactly what I was dreaming about. Unfortunately, the technology developed by Coderloop for creating interactive exercises on system administration has never been released. During the correspondence with one of the founders of the Coderloop project, I received many kind words and wishes for the development and further development of this idea. My inspiration knew no bounds. So I started developing the Root'n'Roll platform, which I began to devote to almost all my free time.

    Goals of Root'n'Roll


    The main goal of the project was not creating a specific resource for training, conducting competitions or recruiting system engineers, but something more - creating a basic technical platform on which to build and develop any of these areas.

    The key requirements of the base platform are the following:
    • To be able to run hundreds or even thousands of virtual machines with a minimal Linux system as quickly and cheaply as possible, which can be safely torn to pieces by users with root rights. Cheap means minimizing the amount of system resources consumed per virtual machine. The absence of any substantial budget implies the rejection of the use of cloud hosting such as Amazon EC2 or Rackspace on the principle of “one virtual machine” = “one cloud instance”.
    • Allow you to work with a virtual machine through a web terminal directly in a browser window. No external programs are required for this, only a web browser.
    • Finally, have an interface to test the configuration of any virtual machine. Testing the configuration can include both checking the status of the file system (whether the necessary files are in their places, whether their contents are correct), and checking services and network activity (are some services running, are they configured correctly, are they responding correctly specific network requests, etc.).

    All of these requirements have already been implemented to some extent. First things first.

    Virtual Machine Launch


    To start virtual machines, you first had to decide on virtualization technology. Options with hypervisor virtualization: hardware (KVM, VMware ESXi) and paravirtualization (Xen) - disappeared quite quickly. These approaches have rather large overheads for system resources, in particular memory, because for each machine, its own separate kernel is launched, and various OS subsystems start. Virtualizations of this type also require a dedicated physical server, in addition, with the stated desire to run hundreds of virtual machines, with very good hardware. For clarity, you can look at the technical specifications of the server infrastructure used in the LinuxZoo project.

    Further, the focus fell on virtualization systems of the operating system level (OpenVZ and Linux Containers (LXC)). Container virtualization can be very roughly compared to running processes in an isolated chroot environment. A huge number of articles have already been written about the comparison and technical characteristics of various virtualization systems, including on the hub, so I do not stop at the details of their implementation. Containerization does not have such overhead as full virtualization, because all containers share one core of the host system.

    Just in time for my choice of virtualization, I managed to go to Open Source and make a name for myself about the Docker project., which provides tools and environments for creating and managing LXC containers. Running an LXC container running Docker (hereinafter referred to as a docker container or just a container) is comparable to running a process on Linux. Like a regular linux process, the container does not need to reserve RAM in advance. Memory is allocated and cleared as it is used. If necessary, you can set flexible limits on the maximum amount of memory used in the container. The huge advantage of containers is that they use a common host subsystem for managing memory pages (including the copy-on-write mechanism and shared pages) to control their memory. This allows you to increase the density of containers, that is, you can run much more instances of linux machines than when using hypervisor virtualization systems. Thanks to this, efficient utilization of server resources is achieved. Even the micro-instance in the Amazon EC2 cloud can easily handle the launch of several hundred docker containers with the bash process inside. Another nice bonus is the very process of launching the container takes milliseconds.

    Thus, at first glance, Docker cheaply solved the problem of launching a large number of machines (containers), so for the first proof-of-concept solution I decided to dwell on it. The security issue is worthy of a separate discussion, for now let’s omit it. By the way, the Coderloop guys also used LXC containers in their exercises to create virtual environments.

    Container management


    Docker provides a software REST interface for creating and running containers. Through this interface, you can only manage containers located on the same server where the docker service is running.

    If you look one step ahead, it would be nice to be able to scale horizontally, that is, to launch all containers not on one server, but to distribute them on several servers. To do this, you need to have centralized management of docker hosts and a scheduler that allows you to balance the load between multiple servers. Having a scheduler can be very useful during server maintenance, for example, installing updates that require a reboot. In this case, the server is marked as “in service”, as a result of which new containers are not created on it.

    Additional requirements for a centralized control system are network settings in containers and quota management of system resources (processor, memory, disk). But all these requirements are nothing but the tasks that are successfully solved by the clouds ( IaaS ). On time, in the summer of 2013, a post was released from Docker developers about integrating Docker with the OpenStack cloud platform . The new nova-docker driver allows using the openstack cloud to manage the fleet of docker hosts: launch containers, raise the network, control and balance the consumption of system resources - exactly what we need!

    Unfortunately, even today, the nova-docker driver is still pretty crude. Often there are changes incompatible even with the latest stable version of openstack. You have to independently maintain a separate stable driver branch. I also had to write some patches to improve performance. For example, to obtain the status of one docker container, the driver requested the status of all running containers (sent N http requests to the docker host, where N is the number of all containers). If several hundred containers were launched, an unnecessary load was created on docker hosts.

    Despite some inconveniences, choosing OpenStack as a container orchestrator in my case is still worth it: a centralized management system for virtual machines (containers) and computing resources with a single API has appeared . The big bonus of a single interface is that the addition to Root'n'Roll of support for full-fledged virtual machines based on KVM will not require any significant changes in the architecture and code of the platform itself.

    Of the disadvantages of OpenStack, one can note only the rather high complexity of deploying and administering a private cloud. Quite recently, Virtkick , a sort of simplified alternative to OpenStack, has announced itself . I look forward to its successful development.

    Web terminal selection


    Even at the initial stage of compiling requirements for the Root'n'Roll platform, the main feature that I wanted to see first of all was the ability to work with a remote linux server through a web terminal window right in my browser. It was with the choice of the terminal that the development began, or rather, the study and selection of technical solutions for the platform. The web terminal is almost the only entry point for the user in the entire system. This is exactly what he sees first and what he works with.

    One of the few online projects that used the web terminal at that time was PythonAnywhere . He became the standard that I periodically looked at. It has now appeared a huge number of web projects and cloud development environments in which you can see the terminals: Koding, Nitrous , Codebox , Runnable , etc.

    Any web terminal consists of two main parts:
    • Client: a dynamic JavaScript application that intercepts keystrokes and sends them to the server side, receives data from the server side and renders them in the user's web browser.
    • Server-side: a web service that receives messages about keystrokes from a client and sends them to a control terminal device ( pseudo terminal or PTY) associated with a terminal process, such as bash. The raw terminal output from the pty device is sent unchanged to the client part or processed on the server side, in this case it is transferred to the client part already converted, for example, in HTML format.

    Was reviewed by a plurality of terminal emulators: Anyterm , Ajaxterm , Shellinabox (used PythonAnywhere), Secure Shell , GateOne and tty.js . The last two turned out to be the most functional and actively developing. Due to distribution under the freer MIT license, the choice was tty.js. Tty.js is a client-side terminal emulator (parsing of raw terminal output, including control sequences, is performed on the client using JavaScript).

    The server side of tty.js, written in Node.js, has been mercilessly broken and rewritten in Python. Socket.IO transport has been replaced by SockJS, a similar successful experience has already been written on the blog PythonAnywhere.

    Flight of the "Firefly"


    Finally we got to the Root'n'Roll platform engine. The project is built on the principles of microservice architecture . The main server side development language is Python.

    Root'n'Roll platform service connection diagram:
    Root'n'Roll platform service connection diagram

    Microservices are named after the main characters of the science-fiction television series Firefly . The main characters of the film are the crew of the Serenity interplanetary spacecraft of the Firefly class. The purpose and place of the character on the ship to some extent reflects the purpose and functionality of the service corresponding to his name.

    Mal - backend API


    Mal is the owner and captain of the ship. On our ship, Mal is a key service that coordinates the work of all other services. This is a Django application that implements the business logic of the Root'n'Roll platform. Mal acts as the API client for the private OpenStack cloud and, in turn, provides a high-level REST interface for performing the following actions:
    • Create / delete a virtual machine (container). The request is converted and delegated to the cloud.
    • Creation and connection of the terminal to the container.
    • Run a test script to verify the configuration of the virtual machine. The request is delegated to the microservice of the checking system.
    • Getting the results of a configuration check.
    • Authentication of clients and authorization of various actions.

    Kaylee - terminal multiplexer


    Kaylee is a ship mechanic. Kaylee service is the engine of the process of communicating a web terminal with a remote virtual machine. This is an asynchronous web server in Python and Tornado that implements the server side of tty.js.

    On the one hand, the client part of tty.js (terminal window) establishes a connection with Kaylee via the SockJS protocol. Kaylee, on the other hand, establishes a connection with the terminal device of the virtual machine. In the case of docker containers, the connection is established via the HTTPS protocol with the controlling pty device of the process running in the container, as a rule, this is a bash process. After that, Kaylee performs a simple proxy function between two established connections.

    To authenticate the client and receive data about the virtual machine, Kaylee communicates with Mal via the REST API.

    Zoe - check system


    Zoe - the assistant captain on board, in everything he unconditionally trusts. Zoe is an automated validation system that tests virtual machine configurations. The Zoe service in the form of a celery task receives a task from Mal to launch a test script. At the end of the test, it reports the test results back to Mal via the REST API. As a rule, Zoe does not forgive mistakes (many participants in the Linux course on Stepic have already seen this).

    The test script is nothing more than a Python script with a set of tests written using the py.test testing framework . A special plugin has been developed for py.test that processes the results of running the tests and sends them to Mal via the REST API.

    An example scenario for an exercise in which you need to write and run a simple one-page website on the Django web framework:

    import re
    import requests
    def test_connection(s):
        assert s.run('true').succeeded, "Could not connect to server"
    def test_is_django_installed(s):
        assert s.run('python -c "import django"').succeeded, "Django is not installed"
    def test_is_project_created(s):
        assert s.run('[ -x /home/quiz/llama/manage.py ]').succeeded, "Project is not created"
    def test_hello_lama(s):
        try:
            r = requests.get("http://%s:8080/" % s.ip)
        except:
            assert False, "Service is not running"
        if r.status_code != 200 or not re.match(".*Hello, lama.*", r.text):
            assert False, "Incorrect response"
    

    To remotely execute commands on the tested virtual machine, Zoe uses the Fabric library .

    Conclusion


    Soon, on the Stepic platform, anyone can create their own courses and interactive exercises for working in Linux using all the technologies that I described in the article.

    In the next article, I will write about my successful participation in the JetBrains EdTech hackathon, about the features and problems of integrating Root'n'Roll into Stepic and, if you wish, I will reveal more deeply any of the topics covered.

    Also popular now: