Memcached under OpenSolaris is faster than under Linux
- Transfer
Using our previous experience in testing memcached performance on SunFire X2270 (server based on Intel Xeon (Nehalem) processors) with OpenSolaris, we decided to run the same tests on the same server, but using RHEL5. We already noted in a post with the first test results that in order to achieve the highest possible performance, we used Intel Oplin 10GbE network cards. As it turned out, to use this card under Linux, we had to work a bit on the drivers and rebuild the kernel.
Upd : Shanti's answer why they used 2x2: Basically, we tried various settings to see which gives the best performance. On Linux, we got the best results with 2 + 2 rings. On OpenSolaris, it was 4 + 4. We suspect that the Linux implementation for multiple rings is probably not performing well yet (it is relatively new).
Short translation: because gladiolus
- With the default ixgb driver from the RedHat distribution (version 1.3.30-k2 in the 2.6.18 kernel), the network interface just hung up when the test was run.
- Therefore, we had to download the driver from the Intel website (1.3.56.11-2-NAPI) and rebuild it. Everything worked with him, and the maximum bandwidth we received was 232K operations / second on the same kernel version 2.6.18. However, this version of the kernel does not support multithreaded data transfer ( note translator - in the original multiply rings).
- The kernel version 2.6.29 includes multi-threaded data transfer, but still does not include the latest ixgb driver version 1.3.56-2-NAPI. Therefore, we downloaded, compiled and installed new versions of the kernel and driver. This worked and after a little tuning gave us a maximum throughput of 280K op./sec.
results
As we already reported , a system with OpenSolaris and memcached 1.3.2 gave us a maximum throughput of about 350K op./sec. On the same server with RHEL5 (with the kernel 2.6.29) and the same version of memcached, we got 280K op./sec. It turns out that OpenSolaris bypasses Linux by 25%!Linux tuning
The following system values were used to get maximum performance.net.ipv4.tcp_timestamps = 0 net.core.wmem_default = 67108864 net.core.wmem_max = 67108864 net.core.optmem_max = 67108864 net.ipv4.tcp_dsack = 0 net.ipv4.tcp_sack = 0 net.ipv4.tcp_window_scaling = 0 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_max_syn_backlog = 200000The following ixgb driver specific parameters were set (2 queues for receiving and 2 for transmitting)
RSS = 2.2 InterruptThrottleRate = 1600.1600
Tuning OpenSolaris
In / etc / system, we set the following parameters for MSI-X ( note translator - in the original MSIX, but the parameters relate specifically to MSI-X):set ddi_msix_alloc_limit = 4 set pcplusmp: apic_intr_policy = 1For the ixgbe interface, the 4th transmission queue and the 4th reception queue gave us better performance:
tx_queue_number = 4, rx_queue_number = 4And besides, we allocated separate processor cores to the network interface:
dladm set-linkprop -p cpus = 12,13,14,15 ixgbe0
Upd : Shanti's answer why they used 2x2: Basically, we tried various settings to see which gives the best performance. On Linux, we got the best results with 2 + 2 rings. On OpenSolaris, it was 4 + 4. We suspect that the Linux implementation for multiple rings is probably not performing well yet (it is relatively new).
Short translation: because gladiolus