Puppet under load

Published on November 01, 2012

Puppet under load

    Puppet is a fairly convenient configuration management tool. In fact, this is a system that allows you to automate the configuration and management of a large fleet of machines and services.

    There is a lot of basic information about the system itself, including on Habré: here , here and here . We tried to collect in one article several "recipes" for using Puppet under really heavy loads - in the "combat conditions" of Badoo.

    What will be discussed:

    • Puppet: educational program;
    • clustering, scaling;
    • asynchronous Storeconfigs;
    • collection of reports;
    • analysis of the data.


    Immediately make a reservation that the article was written in the wake of the report of Anton Turetsky at the HighLoad ++ 2012 conference . In a few days, our “recipes” were overgrown with additional details and examples.

    Returning to the loads, it should be noted that in Badoo they are really high:
    • more than 2000 servers;
    • more than 30,000 lines in manifests (English manifest , in this case, the configuration file for the management server);
    • more than 200 servers that request puppet master configurations every 3 minutes;
    • More than 200 servers sending reports in the same time period.


    How it works





    Puppet itself is a client-server application. In the case of Puppet, the client initiates the connection; in this case, this is the node (eng. node ) on which you want to deploy the configuration.

    Step 1: facts in exchange for configuration




    The client collects facts about himself using the facter utility , an inherent dependency of the Puppet application, then sends them with the usual HTTP POST request to the server and waits for his response.

    Step 2: processing and response




    The server receives a fact dump from the client and compiles the catalog. At the same time, it proceeds from the information in the manifest available on the server, but also takes into account the facts received from the client. The catalog is sent to the client.

    Step 3: applying the catalog and reporting the results




    Having received the directory from the server, the client makes changes to the system. The results of the execution are reported to the server using an HTTP POST request.

    Step 4: collecting and storing reports




    After the client fulfills all the rules, the report should be saved. Puppet suggests using both its own developments and third-party collectors for this. We try!

    Install the basic package and get the following picture:



    At first glance, everything is fine - set and work. However, over time, the picture becomes less joyful. The number of clients increases, system administrators write manifestos, as a result of which the time spent on compiling the code and processing each client grows.



    And at a certain moment the picture becomes very sad.



    The first thing that comes to mind is to increase the number of puppet master processes on the server. Yes, precisely because Puppet does not know how to do this in the basic package, and it doesn’t “smear” the kernels.

    Are there any solutions proposed by the manufacturer? Of course, but for various reasons, they were not applicable in our conditions.

    Why not Apache + mod_passenger? For certain reasons, our company does not use the Apache web server at all.

    Why not nginx + passenger? To avoid the need for an additional module in the nginx assembly that we use.

    What then?

    Meet Unicorn





    Why Unicorn ?

    Here, in our opinion, its advantages:
    • Linux kernel level balancing
    • starting all processes in their environment;
    • update without loss of nginx-style connections;
    • the ability to listen on multiple interfaces;
    • similarities with PHP-FPM, but for Ruby.

    Another reason for choosing Unicorn was the ease of installation and configuration.

    worker_processes 12
    	working_directory "/etc/puppet"
    	listen '0.0.0.0:3000', :backlog => 512
    	preload_app true
    	timeout 120
    	pid "/var/run/puppet/puppetmaster_unicorn.pid"
    	if GC.respond_to?(:copy_on_write_friendly=)
      	GC.copy_on_write_friendly = true
    	end
    	before_fork do |server, worker|
      	old_pid = "#{server.config[:pid]}.oldbin"
      	if File.exists?(old_pid) && server.pid != old_pid
        	begin
          	Process.kill("QUIT", File.read(old_pid).to_i)
        	rescue Errno::ENOENT, Errno::ESRCH
        	end
      	end
    	end
    

    So the good news: we have the ability to run several processes. But there is a bad one: managing processes has become much more difficult. Not scary - there is a “recipe” for this situation.

    "In God We Trust"





    God is a process monitoring framework. It is easy to set up and is written in Ruby, just like Puppet itself.

    In our case, God manages various puppet master process instances:
    • production environment;
    • testing environment;
    • Puppet CA.

    There are no special problems with customization either. It is enough to create a configuration file in the directory / etc / god / to process * .god files.

    God.watch do |w|
      	w.name = "puppetmaster"
      	w.interval = 30.seconds
      	w.pid_file = "/var/run/puppet/puppetmaster_unicorn.pid"
      	w.start = "cd /etc/puppet && /usr/bin/unicorn -c /etc/puppet/unicorn.conf -D"
      	w.stop = "kill -QUIT `cat #{w.pid_file}`"
      	w.restart = "kill -USR2 `cat #{w.pid_file}`"
      	w.start_grace = 10.seconds
      	w.restart_grace = 10.seconds
      	w.uid = "puppet"
      	w.gid = "puppet"
    w.behavior(:clean_pid_file)
      	w.start_if do |start|
        	start.condition(:process_running) do |c|
          	c.interval = 5.seconds
          	c.running = false
        	end
      	end
    	end
    

    Please note that we have allocated Puppet CA in a separate instance. This is done specifically so that all customers use a single source to verify and obtain their certificates. A little later we will tell you how to achieve this.

    Balancing


    As already mentioned, the entire process of exchanging information between the client and server occurs via HTTP, which means that nothing prevents us from setting up a simple http-balancing, resorting to nginx.



    1. Create upstream:

      upstream puppetmaster_unicorn {
          		server 127.0.0.1:3000 fail_timeout=0;
          		server server2:3000 fail_timeout=0;
      	}
      # для основного процесса puppet master (к примеру, production-окружение)
      upstream puppetca {
          server 127.0.0.1:3000 fail_timeout=0;
      	}
      # для Puppet CA
      

    2. We redirect requests to the recipients:

      • Puppet ca

        location ^~ /production/certificate/ca {
                    	proxy_pass http://puppetca;
            	}
           	location ^~ /production/certificate {
                    	proxy_pass http://puppetca;
            	}
            	location ^~ /production/certificate_revocation_list/ca {
                    	proxy_pass http://puppetca;
            	}
        

      • Puppet master

        location / {
            	proxy_pass http://puppetmaster_unicorn;
           	proxy_redirect  	off;
            	}
        



    To summarize the interim results. So, the above actions allow us to:
    1. run multiple processes;
    2. manage the launch of processes;
    3. balance the load.

    What about scaling?

    The technical capabilities of the puppet master server are not unlimited, and if the load on it becomes extremely permissible, then the question of scaling arises.

    This problem is solved as follows:
    • The RPM package for our puppet server is stored in our repository;
    • all manifests, as well as configurations for God and Unicorn, are in our Git repository.

    To start another server, we need only:
    1. put the base system;
    2. install puppet-server, Unicorn, God;
    3. clone a Git repository;
    4. add machine to upstream.

    Our "tuning" does not end there, so again back to theory.

    Storeconfigs: what and why?





    If a client sends us a report and facts about himself, then why not store this information?

    This will help us Storeconfigs - puppet-server option, which allows you to save relevant customer information in the database. The system compares the latest data from the client with the existing data. Storeconfigs supports the following repositories: SQLite, MySQL, PostgreSQL. We use MySQL.

    In our case, many clients pick up the configuration every three minutes, about the same amount of reports are sent. As a result, we already get large queues for writing to MySQL. But we still have to take data from the database.

    This problem received the following solution:



    Using Apache ActiveMQ allowed us to send messages from clients not directly to the database, but to pass them through the message queue.

    As a result, we have:
    • faster execution of the puppet-process on the client, because when you try to send a report to the server, it immediately gets “OK” (putting a message in the queue is easier than writing it to the database);
    • reducing the load on MySQL (the puppet queue process writes data to the database asynchronously).

    To configure puppet-server, it was necessary to add the following lines to the configuration:

    [main]
    	async_storeconfigs = true
    	queue_type = stomp
    	queue_source = stomp://ACTIVEMQ_address:61613
    	dbadapter = mysql
    	dbuser = secretuser
    	dbpassword = secretpassword
    	dbserver = mysql-server
    

    And don't forget about starting the puppet queue process.

    Thanks to the described settings, Puppet works on the server smartly and well, but it would be nice to regularly monitor its activity. It's time to think about Puppet Reports.

    Non-standard reporting


    What is offered to us by default:
    • http;
    • tagmai;
    • log;
    • rrdgraph;
    • store.

    Alas, none of the options suited us completely and completely for one single reason - the lack of a high-quality visual component. Unfortunately, or perhaps fortunately, the standard Puppet Dashboard at that moment seemed too boring to us.

    Therefore, we opted for Foreman , which pleased us with nice diagrams with the most necessary.





    In the picture on the left, we see how much time it takes to use each type of resource. For 360 degrees, the full execution time on the client is taken.

    The picture on the right displays the number of events with status. In this example, only one service was launched, more than 10 events were skipped due to the fact that their current state fully corresponds to the reference one.

    And the last recommendation: upgrade to version 3.0.0. The



    graph clearly shows the time gain after the upgrade. True, productivity did not grow by 50%, as the developers promised, but the increase was quite noticeable. After editing the manifestos (see message ), we still achieved the promised 50%, so our efforts fully paid off.

    In conclusion, we can say that with proper configuration, Puppet is able to cope with serious loads and configuration management for 2000 servers is quite within its reach.

    Anton [ banuchka ] Turkish, system administrator
    Badoo