Measure and Troubleshoot Linux CPU Resource Usage

Tracking down CPU usage and identifying heavy processes on a Debian Linux server.

Install Performance Monitoring Tools

Install sysstat and htop:

# apt-get update && apt-get install sysstat htop

For RHEL/CentOS, do the following (must have EPEL repository enabled):

# yum install -y sysstat htop

The sysstat package contains the following system performance tools:

sar: collects and reports system activity information.
iostat: reports CPU utilization and disk I/O statistics.
mpstat: reports global and per-processor statistics.
pidstat: reports statistics for Linux tasks (processes).
sadf: displays data collected by sar in various formats.
nfsiostat: reports I/O statistics for network filesystems.
cifsiostat: reports I/O statistics for CIFS filesystems.

Make sure that sar is enabled in /etc/default/sysstat. If not enabled, do it.

You may also want to change the history value in /etc/sysstat/sysstat to something different than 7 days:

HISTORY=60

Note that if value is greater than 28, then log files will be kept in multiple directories, one for each month.

By default sysstat will collect data every 10 minutes. You can change this by modifying the cronjob /etc/cron.d/sysstat.

Finally, restart the service:

# service sysstat restart

Measure CPU Usage

CPU Usage with top and htop

The top program provides a dynamic real-time view of a running system. It’s very useful for determining processes which use the most CPU (and not just that) at the time of monitoring. To sort output by CPU, use a “Ctrl”+”P” combination.

As may be seen below (in red), the dd command, run by user sandy, is utilising 66% of all CPU resources.

$ top
top - 20:53:16 up 3 days, 21:08,  2 users,  load average: 0.86, 0.66, 0.32
Tasks: 127 total,   2 running, 124 sleeping,   0 stopped,   1 zombie
%Cpu(s):  7.0 us, 24.7 sy, 0.0 ni, 48.9 id, 17.7 wa, 0.0 hi, 1.6 si, 0.0 st
KiB Mem:   1022744 total,  1008568 used,    14176 free,      584 buffers
KiB Swap:   991228 total,   105604 used,   885624 free,   662508 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
29390 sandy     20   0  3536  560  472 R  66.0  0.1   0:10.54 dd
   23 root      20   0     0    0    0 S   5.3  0.0   1:22.64 kswapd0
 2860 mysql     20   0  321m 164m 2992 S   0.7 16.5  68:55.36 mysqld
29370 sandy     20   0  4512 1348  952 R   0.7  0.1   0:02.09 top
 1923 root      20   0  279m  96m  984 S   0.3  9.7  53:45.61 nessusd
 1945 root      20   0  2052  424  352 S   0.3  0.0   1:19.69 vnstatd
 2642 zabbix    20   0  3036  572  508 S   0.3  0.1   7:45.23 zabbix_agentd
 3103 zabbix    20   0 59872 9512 9216 S   0.3  0.9   4:07.73 zabbix_server
 3147 www-data  20   0 24132 1080  896 S   0.3  0.1   6:15.05 zmdc.pl
29346 sandy     20   0  9452 1492  816 S   0.3  0.1   0:00.35 sshd

Line 3, marked in blue, shows CPU state percentages based on the interval since the last refresh. Values are as follows (as per man page):

us: time running un-niced user processes.
sy: time running kernel processes.
ni: time running niced user processes.
id: time spent idle.
wa: time waiting for I/O completion.
hi: time spent servicing hardware interrupts.
si: time spent servicing software interrupts.
st: time stolen from this vm by the hypervisor.

Htop is an interactive process viewer, similar to top. The head bit of htop is presented below:

CPU Usage with ps

The ps command displays information about a selection of the active processes. While top gives a dynamic real-time view of system resources, ps shows a snapshot of the currently running processes. The ps command is usually a number one tool for identifying users who drain the most CPU at current moment.

There are three different kinds of options which can be passed to ps:

UNIX. These are preceded by a dash (standard syntax).
BSD. These must not be used with a dash.
GNU long options. These are preceded by two dashes.

Get a snapshot of the 9 most CPU consuming processes:

$ ps -eo pid,user,s,pcpu,comm --sort -pcpu | head
  PID USER     S %CPU COMMAND
 3360 zabbix   S  1.3 zabbix_server
 2041 root     S  1.2 nessusd
 3059 mysql    S  1.2 mysqld
 2395 clamav   S  0.3 freshclam
 2159 root     S  0.1 /usr/bin/monitorix
 2876 zabbix   S  0.1 zabbix_agentd
 2877 zabbix   S  0.1 zabbix_agentd
 2878 zabbix   S  0.1 zabbix_agentd
 2879 zabbix   S  0.1 zabbix_agentd

Parameters used are as below:

-e: select all processes.
-o: specify user-defined format.
pid: process ID.
user: user name.
s: minimal state display (one character).
1. S for sleeping (idle).
2. R for running.
3. D for disk sleep (uninterruptible).
4. Z for zombie (waiting for parent to read it’s exit status).
5. T for traced or suspended (e.g by SIGTSTP).
6. W for paging.
pcpu: cpu utilisation.
comm: command name (only the executable name).
—sort -pcpu: sort pcpu in descending numerical order.

CPU Usage with mpstat

The mpstat command reports activities for each available processor as well as global CPU activities. In my personal opinion, mpstat gives one the most informative outputs when troubleshooting CPU leakage.

Display three CPU utilisation (-u) reports at one second intervals:

$ mpstat -u 1 3
Linux 3.2.0-4-686-pae (flames) 	20/02/14        _i686_	(2 CPU)

19:43:03     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
19:43:04     all    4.86    0.00   26.49   16.22    0.54    1.62    0.00    0.00   50.27
19:43:05     all    8.33    0.00   24.44   16.11    0.00    1.11    0.00    0.00   50.00
19:43:06     all    6.67    0.00   24.44   17.78    0.00    1.11    0.00    0.00   50.00
Average:     all    6.61    0.00   25.14   16.70    0.18    1.28    0.00    0.00   50.09

The following values are shown (as per man page):

CPU: processor number. The keyword all indicates that statistics are calculated as averages among all processors.
%usr: show the percentage of CPU utilisation that occurred while executing at the user level (application).
%nice: show the percentage of CPU utilisation that occurred while executing at the user level with nice priority.
%sys: show the percentage of CPU utilisation that occurred while executing at the system level (kernel). Note that this does not include time spent servicing hardware and software interrupts.
%iowait: show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%irq: show the percentage of time spent by the CPU or CPUs to service hardware interrupts.
%soft: show the percentage of time spent by the CPU or CPUs to service software interrupts.
%steal: show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
%guest: show the percentage of time spent by the CPU or CPUs to run a virtual processor.
%idle: show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

On multiprocessor systems, we can specify the processors we want to get the report for by passing a -P parameter and specifying a processor’s number.

Display three per-processor (-P) CPU reports for processors 0 and 1 at one second intervals:

$ mpstat -P 0,1 1 3
Linux 3.2.0-4-686-pae (flames) 	20/02/14        _i686_	(2 CPU)

19:43:16     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
19:43:17       0   21.98    0.00   62.64   14.29    0.00    1.10    0.00    0.00    0.00
19:43:17       1    0.00    0.00   11.11    0.00    0.00    0.00    0.00    0.00   88.89
19:43:18       0   20.65    0.00   61.96   15.22    0.00    1.09    0.00    0.00    1.09
19:43:18       1    2.04    0.00   10.20    7.14    0.00    0.00    0.00    0.00   80.61
19:43:19       0   18.89    0.00   66.67   14.44    0.00    0.00    0.00    0.00    0.00
19:43:19       1    1.02    0.00    8.16    0.00    0.00    0.00    0.00    0.00   90.82
Average:       0   20.51    0.00   63.74   14.65    0.00    0.73    0.00    0.00    0.37
Average:       1    1.02    0.00    9.83    2.37    0.00    0.00    0.00    0.00   86.78

So CPU 1, marked in blue, is chilling while CPU 0, marked in red, is doing some work.

CPU Usage with iostat

The iostat command can report CPU statistics as well as I/O statistics for devices and partitions. For multiprocessor systems, the CPU values are global averages among all processors. The iostat command doesn’t report per-processor usage. I would describe iostat as an inferior, or perhaps, simplified version of mpstat when talking about CPU resource monitoring.

Display three CPU reports (-c) at one second intervals:

$ iostat -c 1 3
Linux 3.2.0-4-686-pae (flames) 	20/02/14        _i686_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.59    0.68    0.70    0.55    0.00   96.47

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.68    0.00   32.80   10.22    0.00   47.31

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.60    0.00   35.48    8.06    0.00   47.85

Note that the first line gives averages since the last reboot. The CPU report has the following format (as per man page):

%user: show the percentage of CPU utilisation that occurred while executing at the user level (application).
%nice: show the percentage of CPU utilisation that occurred while executing at the user level with nice priority.
%system: show the percentage of CPU utilisation that occurred while executing at the system level (kernel).
%iowait: show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%steal: show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
%idle: show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

CPU Usage with vmstat

The vmstat command reports information about many resource activities including CPU, processes, memory, paging, block IO and disks activity. Note that the first reported line gives averages since the last reboot. Default output shows memory in KB (1024B).

The vmstat command gives the least detailed CPU reports compared with mpstat or iostat, but is usually a number one tool for getting an overall overview of what’s happening on a system right now.

Display three reports at one second intervals:

$ vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1  69188  13436  47032 589176    0    0    63   106  110   40  2  1 96  1
 1  0  69188  13700  47032 588840    0    0 113920     0 1642 2231 9 35 48  7
 1  0  69188  13204  47032 589516    0    0 112256     0 1633 2252 9 37 47  7

CPU activity is marked in blue. Numbers are percentages of total CPU time. Time values are (as per man page):

us: time spent running non-kernel code (user time, including nice time).
sy: time spent running kernel code (system time).
id: time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: time stolen from a virtual machine, if present. Prior to Linux 2.6.11, unknown.

CPU Usage with sar

The sar command gives the report of selected resource activity counters in the system.

As per man page, sar can write information the specified number of times spaced at the specified intervals in seconds. If the interval parameter is set to zero, the sar command displays the average statistics for the time since the system was started. This is extremely helpful for analysing historical data and for future resource capacity planning.

Display three real-time CPU utilisation (-u) reports at one second intervals:

$ sar -u 1 3
Linux 3.2.0-4-686-pae (flames) 	20/02/14        _i686_	(2 CPU)

19:18:02        CPU     %user     %nice   %system   %iowait    %steal     %idle
19:18:03        all      7.94      0.00     35.45      8.47      0.00     48.15
19:18:04        all      6.74      0.00     39.90      7.77      0.00     45.60
19:18:05        all      9.14      0.00     34.95      8.06      0.00     47.85
Average:        all      7.92      0.00     36.80      8.10      0.00     47.18

The report shows the following fields (as per man page):

%user: percentage of CPU utilisation that occurred while executing at the user level (application). Note that this field includes time spent running virtual processors.
%nice: percentage of CPU utilisation that occurred while executing at the user level with nice priority.
%system: percentage of CPU utilisation that occurred while executing at the system level (kernel). Note that this field includes time spent servicing hardware and software interrupts.
%iowait: percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%steal: percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
%idle: percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

The ALL keyword (-u ALL) can be used to display all CPU fields. For example, “sar -u ALL 1 3” should produce the same report as “mpstat -u 1 3“.

Extract historical CPU utilisation (-u) records starting (-s) 1 PM and ending (-e) 2 PM time interval:

$ sar -u -s 13:00:00 -e 14:00:00
Linux 3.2.0-4-686-pae (flames) 	20/02/14        _i686_	(2 CPU)

13:05:01        CPU     %user     %nice   %system   %iowait    %steal     %idle
13:15:01        all      0.69      0.00      0.53      0.47      0.00     98.30
13:25:01        all      0.71      0.00      0.54      0.36      0.00     98.39
13:35:01        all      1.49      0.00      0.53      0.30      0.00     97.68
13:45:01        all      0.64      0.00      0.54      0.30      0.00     98.53
13:55:01        all      0.65      0.00      0.51      0.28      0.00     98.56
Average:        all      0.84      0.00      0.53      0.34      0.00     98.29

Extract historical per-processor (-P) statistics for processors 0 and 1 starting (-s) 1 PM and ending (-e) 2 PM time interval:

$ sar -P 0,1 -s 13:00:00 -e 14:00:00
Linux 3.2.0-4-686-pae (flames) 	20/02/14        _i686_	(2 CPU)

13:05:01        CPU     %user     %nice   %system   %iowait    %steal     %idle
13:15:01          0      0.69      0.00      0.53      0.92      0.00     97.86
13:15:01          1      0.69      0.00      0.53      0.03      0.00     98.75
13:25:01          0      0.67      0.00      0.55      0.70      0.00     98.08
13:25:01          1      0.75      0.00      0.53      0.02      0.00     98.70
13:35:01          0      2.30      0.00      0.53      0.57      0.00     96.59
13:35:01          1      0.68      0.00      0.53      0.03      0.00     98.76
13:45:01          0      0.63      0.00      0.50      0.58      0.00     98.29
13:45:01          1      0.65      0.00      0.57      0.02      0.00     98.77
13:55:01          0      0.64      0.00      0.48      0.55      0.00     98.33
13:55:01          1      0.65      0.00      0.54      0.01      0.00     98.79
Average:          0      0.99      0.00      0.52      0.66      0.00     97.83
Average:          1      0.69      0.00      0.54      0.02      0.00     98.75

Stats for previous days can be displayed by extracting records from log files (-f) which are stored under /var/log/sysstat/. For example, to get CPU stats for 18th of February, we would do:

$ sar -u -f /var/log/sysstat/sa18

Sar is irreplaceable tool for future capacity planning. Check sar’s man page for more info.

Measure and Troubleshoot Linux Memory Resource Usage
Measure and Troubleshoot Linux Disk I/O Resource Usage
Measure and Troubleshoot Linux Network Resource Usage

3 thoughts on “Measure and Troubleshoot Linux CPU Resource Usage”

Lucio Valentini says:

23/04/2019 at 9:37 am

in CentOS 7.5 the directory for the saXX files is /var/log/sa/ not /var/log/sysstat

- Lisenet says:
  
  23/04/2019 at 1:19 pm
  
  I’m aware of that, thanks. This article was originally written with a Debian based system in mind.
V says:

29/02/2020 at 5:11 pm

Use of mpstat command