Load average
Observing the output of commands such as top , htop , uptime , w, and possibly others, the user must have paid attention to the line load average :
Extending the discussion in the General Overview of Standard System Monitoring Tools , we will try to understand the meaning of these numbers. So, simply put, the numbers reflect the number of blocking processes in the execution queue at a certain time interval, namely 1 minute, 5 minutes and 15 minutes, respectively. The concept of blocking processes is usually well covered recently when talking about nginx. :) In this case, a blocking process is a process that expects resources to continue working. Typically, resources such as the central processor, disk I / O, or network I / O are waiting.
High values of load average indicators indicate that the system can not cope with the load. If we are talking about a target server running under high load, it is usually useful to fine-tune the operating system (network subsystem, limiting the number of simultaneously open files, etc.). High load can also be caused by hardware problems, for example, failure of the drive.
For diagnostics, we turn to other useful data provided by the top output . LineCpu (s) contains information about the allocation of processor time. The first two values directly reflect the work of the CPU in processing processes:
Lingering high (99-100%) indicators indicate the CPU as a bottleneck.
The wa parameter speaks of a simple I / O related:
Above 80% is considered not quite normal and clearly indicates to us that the processor spends a lot of time waiting for I / O (usually this means that the HDD or NIC fails) .
If the hardware is OK and the CPU is fast, the software is most likely a problem. The problem application can be caught using ps axfu. The resulting output will provide a list of processes, as well as the necessary information: processor consumption, memory, status, and, well, information identifying the process (PID and command). Speaking of process states. Typical process states are the following three (a complete list is available on the ps man page - thanks , onix74 ):
The latter is just what we are looking for. Further debugging can be done armed with iostat , systat (FreeBSD), strace , iperf , but this is the topic of another article.
High uptime, low load average, and of course good luck! :)
Extending the discussion in the General Overview of Standard System Monitoring Tools , we will try to understand the meaning of these numbers. So, simply put, the numbers reflect the number of blocking processes in the execution queue at a certain time interval, namely 1 minute, 5 minutes and 15 minutes, respectively. The concept of blocking processes is usually well covered recently when talking about nginx. :) In this case, a blocking process is a process that expects resources to continue working. Typically, resources such as the central processor, disk I / O, or network I / O are waiting.
High values of load average indicators indicate that the system can not cope with the load. If we are talking about a target server running under high load, it is usually useful to fine-tune the operating system (network subsystem, limiting the number of simultaneously open files, etc.). High load can also be caused by hardware problems, for example, failure of the drive.
For diagnostics, we turn to other useful data provided by the top output . LineCpu (s) contains information about the allocation of processor time. The first two values directly reflect the work of the CPU in processing processes:
Lingering high (99-100%) indicators indicate the CPU as a bottleneck.
The wa parameter speaks of a simple I / O related:
Above 80% is considered not quite normal and clearly indicates to us that the processor spends a lot of time waiting for I / O (usually this means that the HDD or NIC fails) .
If the hardware is OK and the CPU is fast, the software is most likely a problem. The problem application can be caught using ps axfu. The resulting output will provide a list of processes, as well as the necessary information: processor consumption, memory, status, and, well, information identifying the process (PID and command). Speaking of process states. Typical process states are the following three (a complete list is available on the ps man page - thanks , onix74 ):
- S - the so-called state of sleep;
- R is the execution state;
- D is the standby state.
The latter is just what we are looking for. Further debugging can be done armed with iostat , systat (FreeBSD), strace , iperf , but this is the topic of another article.
High uptime, low load average, and of course good luck! :)