Mongoose: storage performance testing tool

Good day, Habr. It will be a tool for testing the performance of storage systems (data storage systems), originally developed in the bowels of EMC for internal needs, but which has the ability to grow smoothly. By the way, literally “yesterday” the mongoose received the status of an OpenSource project . And that means it's time to talk a little bit about him. So what is this beast?

image

Key Features


  1. Distributed mode This
    is a way to perform load tasks simultaneously from many network nodes with centralized control and collection of metrics. Simplified illustration from the documentation:

    Allows you to significantly increase the load on distributed storage, emulating the requests of a large number of users.

  2. Reporting
    • Output files with lists of processed objects (files, ...) that can be used again at the input
    • Availability of high-resolution timestamps (μs) for each individual operation

  3. CRUD ( C reate / R ead / U pdate / D elete) - available types of I / O operations to create a load
  4. Support for various types of storage:
    • Amazon S3 REST API
    • EMC Atmos REST API
    • OpenStack Swift REST API
    • File System (local, NFS mount, ...)

  5. Supported object types:
    • Containers (they are directories in the case of FS, they are bucket in the case of S3)
    • Data (files when working with the file system)

  6. Verification of data during read operation
  7. Arbitrary data generation (incompressible uniform noise, text or identical bytes)
  8. Scripting language
  9. "Stub" : an HTTP server that implements the functions of a cloud storage system, which does not store data, but is able to give it back when reading. In fact, a storage mock for testing the functionality and performance of the mongoose itself. He is drawn to becoming a distributed stub as well as a FS driver.
  10. Web GUI
  11. And there are many other wonderful things, the listing of which will take up too much space.

Known analogues


  • Apache JMeter The
    analogue is very conditional and has so little in common with the mongoose that comparison is almost impossible.
  • COSBench (Intel)
    A closer analogue in functionality than JMeter.
    Advantages: has a longer development history and more active developers, supports a wider range of storage systems.

    Disadvantages: inferior in a number of functional points (generation of arbitrary data, for example) and performance (does not solve the so-called "S10K" problem).


A few words about high load


Since a performance testing tool should create a high I / O load, this tool itself must be very productive, and it should spend environment resources very efficiently.
  1. Solving the C10K Problem
    In earlier versions of the mongoose, threads were bound to the corresponding connections. It quickly became clear that this approach was flawed. When working with large objects with a large number of threads, performance indicators were especially bad. However, after applying event-driven asynchronous I / O, the results were impressive. The tool has demonstrated operability even with 1 million simultaneously open connections, and even without the use of distributed mode , which allows you to multiply this number.



  2. Zero Copy wherever possible
  3. Automatic configuration of I / O buffer sizes based on known sizes of transmitted data. Small objects are smaller than the buffer, large objects are larger than the buffer. Writing - more output buffer, reading - more input buffer. Buffers are actually located in Direct Memory to provide Zero Copy.

What it looks like in practice


After you downloaded the tarball with the latest version and unpacked it, the mongoose starts up to disgrace simply:
java -jar mongoose-/mongoose.jar

This will result in the mongoose trying to do everything by default:
  • Execute the creation of new objects (create) forever (until the user interrupts)
  • Use S3 API (i.e. generate HTTP requests)
  • Use 1 connection to default address (127.0.0.1:9020)

To see something other than errors in this case, you can try to run a “stub” on the same machine (which will serve as storage mock):
java -jar mongoose-/mongoose.jar wsmock


For those who want to use the GUI, you will need to run the following command:
java -jar mongoose-/mongoose.jar webui

And go to the browser at 127.0.0.1:8080.


Another important feature is custom scripting. The script is written in JSON format and can be specified at startup as follows
java -jar mongoose-/mongoose.jar -f .json

One of the simplest scenarios is as follows:
{
    "type": "load"
}

This script is used by default mongoose when no other script file is explicitly specified. A slightly more complex example script:
{
   "type" : "for",
   "value" : "threads",
   "in" : [
      1, 10, 100, 1000, 10000, 100000
   ],
   "config" : {
      "load" : {
         "threads" : "${threads}"
      }
   },
   "jobs" : [
      {
            "type" : "load"
      }
   ]
}


More detailed usage information is available in the Documentation section of the mongoose website .

What's next?


At the time of writing, the latest stable version is 2.4.1. Currently, active development of version 3 is underway, in which a new architecture will be applied (monitor - generator - driver - monitor), opening up new possibilities for distributed mode of operation and scripts of the "weighted load" type.



Future plans also include the following:
  • Enhanced Web GUI
  • Implementation of incomplete (partial) data reading operations
  • Expanding the range of supported storage types (Google Cloud Storage, EMC Centera, ...)
  • DBMS support (?)

Also popular now: