Memcached under OpenSolaris is faster than under Linux

Original author: Shanti Subramanyam
  • Transfer
Using our previous experience in testing memcached performance on SunFire X2270 (server based on Intel Xeon (Nehalem) processors) with OpenSolaris, we decided to run the same tests on the same server, but using RHEL5. We already noted in a post with the first test results that in order to achieve the highest possible performance, we used Intel Oplin 10GbE network cards. As it turned out, to use this card under Linux, we had to work a bit on the drivers and rebuild the kernel.
  • With the default ixgb driver from the RedHat distribution (version 1.3.30-k2 in the 2.6.18 kernel), the network interface just hung up when the test was run.
  • Therefore, we had to download the driver from the Intel website (1.3.56.11-2-NAPI) and rebuild it. Everything worked with him, and the maximum bandwidth we received was 232K operations / second on the same kernel version 2.6.18. However, this version of the kernel does not support multithreaded data transfer ( note translator - in the original multiply rings).
  • The kernel version 2.6.29 includes multi-threaded data transfer, but still does not include the latest ixgb driver version 1.3.56-2-NAPI. Therefore, we downloaded, compiled and installed new versions of the kernel and driver. This worked and after a little tuning gave us a maximum throughput of 280K op./sec.

results

As we already reported , a system with OpenSolaris and memcached 1.3.2 gave us a maximum throughput of about 350K op./sec. On the same server with RHEL5 (with the kernel 2.6.29) and the same version of memcached, we got 280K op./sec. It turns out that OpenSolaris bypasses Linux by 25%!

Linux tuning

The following system values ​​were used to get maximum performance.
net.ipv4.tcp_timestamps = 0
net.core.wmem_default = 67108864
net.core.wmem_max = 67108864
net.core.optmem_max = 67108864
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_sack = 0
net.ipv4.tcp_window_scaling = 0
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_max_syn_backlog = 200000
The following ixgb driver specific parameters were set (2 queues for receiving and 2 for transmitting)
RSS = 2.2 InterruptThrottleRate = 1600.1600

Tuning OpenSolaris

In / etc / system, we set the following parameters for MSI-X ( note translator - in the original MSIX, but the parameters relate specifically to MSI-X):
set ddi_msix_alloc_limit = 4
set pcplusmp: apic_intr_policy = 1
For the ixgbe interface, the 4th transmission queue and the 4th reception queue gave us better performance:
tx_queue_number = 4, rx_queue_number = 4
And besides, we allocated separate processor cores to the network interface:
dladm set-linkprop -p cpus = 12,13,14,15 ixgbe0

Upd : Shanti's answer why they used 2x2: Basically, we tried various settings to see which gives the best performance. On Linux, we got the best results with 2 + 2 rings. On OpenSolaris, it was 4 + 4. We suspect that the Linux implementation for multiple rings is probably not performing well yet (it is relatively new).
Short translation: because gladiolus

Also popular now: