How to Diagnose Slow Disk IO on Linux with iostat and iotop
Identify Linux disk bottlenecks using iostat and iotop. %util, await, r/s, w/s metrics explained with practical examples for sysadmins.
Slow application, high load average and CPU showing only 20% usage almost
always point to the same culprit: the disk. The classic symptom is a
MySQL query that normally runs in 50ms suddenly taking 8 seconds
without any change in the execution plan, or a Node.js deploy that
used to finish in 30 seconds now stuck for 5 minutes on npm install.
Before provisioning more CPU or RAM, you need to confirm that the
bottleneck is disk IO. Low CPU with high load is the first sign —
processes are in state D (uninterruptible sleep) waiting for disk
operations. This tutorial shows how to use iostat and iotop to
confirm this in 5 minutes and pinpoint exactly which process is
saturating your storage.
The target persona here is the Linux sysadmin who has already covered
the basics (top, htop, free) and needs to dig deeper. Estimated
execution time: 10-15 minutes to collect conclusive evidence.
Prerequisites
You need a Linux machine (Ubuntu 22.04+, Debian 12+, RHEL 9+ or
similar) with sudo or root access. The tools come in separate
packages that are not part of the default minimal install.
sysstat iotop root or CAP_NET_ADMIN 5.4+ Also confirm that CONFIG_TASK_IO_ACCOUNTING is enabled in the kernel
— without it, iotop shows zeroed values. Check with:
grep CONFIG_TASK_IO_ACCOUNTING /boot/config-$(uname -r)
The output should be CONFIG_TASK_IO_ACCOUNTING=y. On custom kernels
or restricted containers it may appear as # CONFIG_TASK_IO_ACCOUNTING is not set — in that case, use pidstat -d as an alternative.
Installing the tools
Most distributions do not include sysstat or iotop in the base
install. They occupy less than 5 MB combined.
Install sysstat (which provides iostat, sar, pidstat, mpstat):
sudo apt update
sudo apt install -y sysstatOn RHEL/Rocky/Alma:
sudo dnf install -y sysstatInstall iotop:
sudo apt install -y iotopOn RHEL/Rocky/Alma:
sudo dnf install -y iotopConfirm the installed versions:
iostat -V
iotop --versionMinimum recommended versions: sysstat 12.0+ and iotop 0.6+. Much
older versions have known bugs in %util calculation and in the
svctm column.
First reading with iostat
iostat shows aggregated statistics per block device. The first
execution shows averages since boot — discard those numbers and look
at the subsequent readings.
Run iostat in extended mode with a 2-second interval:
iostat -xz 2 5Important flags:
-x— extended, shows all relevant columns (await, %util)-z— omits devices with no activity (reduces noise)2 5— 5 samples of 2 seconds each
The first sample is the average since boot and should be ignored. Analyze from the second one onward.
Identify the device of interest in the Device column. On modern
VPS instances it will be something like vda, sda or nvme0n1.
Partitions (vda1, nvme0n1p1) also appear but you want the parent
device to see total activity.
Interpreting the metrics
The columns that matter for diagnosis are concentrated in 6 fields. Ignore the rest until you master these.
| Metric | Meaning | Alert threshold |
|---|---|---|
r/s | Reads per second (read IOPS) | Depends on the disk |
w/s | Writes per second (write IOPS) | Depends on the disk |
rkB/s | Read throughput in KB/s | Compare to disk spec |
wkB/s | Write throughput in KB/s | Compare to disk spec |
await | Average latency per request in ms | >20ms suspicious, >100ms critical |
%util | % of time with pending requests | >80% saturation on HDD |
On rotational disks (HDD), %util near 100% indicates real
saturation. On SSDs and NVMes that process IO in parallel, %util at
100% can occur with only 10% of the actual bandwidth in use. Use
await and compare rkB/s + wkB/s against the disk spec to evaluate
real saturation on modern media.
The rule of thumb: if await is above 20ms on an SSD or above 100ms
on an NVMe, you have abnormal latency. On HDDs, values of 50-100ms
are already expected, but above 200ms indicates overload.
Example of healthy output
Device r/s w/s rkB/s wkB/s await %util
nvme0n1 142 89 8520 4310 0.42 12.30
0.42ms latency on an NVMe — normal. 12% util and ~12 MB/s combined — disk breathing comfortably.
Example of output with a bottleneck
Device r/s w/s rkB/s wkB/s await %util
vda 890 1240 45200 78900 148.6 99.80
148ms of await is critical — something is doing a lot of synchronous
IO or the disk is saturated. %util at 99.8% on a rotational disk
confirms saturation. 2,130 combined IOPS on a disk that normally
sustains 200 IOPS is evidence of a misbehaved batch job or
swap thrashing.
Finding the guilty process with iotop
iostat tells you THAT there is a bottleneck. iotop tells you WHO
is causing it.
Run iotop in interactive mode:
sudo iotop -oFlags:
-o— shows only processes with active IO (hides the zeroed ones)rkey — reverses sortingokey — toggles “only active” modeqkey — quits
Identify the DISK READ and DISK WRITE columns. The processes at
the top of the list are the largest IO consumers at that moment.
The IO> column shows the percentage of time the process spent
waiting for IO. Values above 30% sustained indicate an IO-bound
process — it will not get faster with more CPU.
To capture evidence without leaving iotop running indefinitely,
use batch mode:
sudo iotop -o -b -n 5 -d 2 > /tmp/iotop-snapshot.txtThis captures 5 snapshots of 2 seconds each and exits. Useful for collecting evidence during a reported slowness window and analyzing later.
On containers or kernels without TASK_IO_ACCOUNTING, use pidstat -d 2 5 (comes with sysstat). It shows IO per process using different
counters that do not depend on the same kernel flag.
Verification
To confirm that your disk bottleneck hypothesis is correct, combine 3 pieces of evidence:
iostat -xz 2 5showing sustained highawaitand high%utiliotop -opointing to a specific process at the top of the listtoporhtopshowing processes in stateD(uninterruptible sleep) —SorSTATEcolumn
ps -eo pid,state,comm | awk '$2 == "D"'
If this list has processes different from the PID that iotop
showed, they are probably victims (waiting for the disk freed by the
culprit). If it is the culprit itself, it is doing heavy synchronous
IO.
Troubleshooting
iotop returns “Could not run iotop as a non-root user”
iotop needs privileges to read statistics from other processes.
Run with sudo. If you want to allow it for a specific user without
sudo every time, grant the capability:
sudo setcap 'cap_net_admin+eip' $(which iotop)
iostat shows %util at 100% but the application is not slow
Classic case of SSD/NVMe where %util lost its meaning. Look at
rkB/s + wkB/s and compare with the disk’s nominal bandwidth. If it
is below 50% of the spec bandwidth, it is not real saturation.
Intermittent high latency that iostat does not capture
iostat shows averages over an interval. Sub-second latency spikes
get hidden in the average. Use iostat -x 1 (1s interval) during the
problematic window, or bpftrace to capture individual block
latencies.
If iotop shows kswapd0 or [kswapd] at the top, your system is
in swap thrashing — out of RAM, the kernel is constantly moving pages
to disk. The solution is not to tune IO; it is to add RAM or find
the memory leak. Check free -h and adjust vm.swappiness.
Next steps
With the bottleneck identified, the common paths are:
- Apply IO scheduler tuning —
mq-deadlinefor transactional workloads,nonefor pure NVMe - Enable historical collection with
sar(part ofsysstat) for trend analysis over weeks - Investigate with
bpftraceorperfto capture stack traces of the slowest IO syscalls - Evaluate migration to NVMe storage if the workload requires sustained IOPS above 10k
If you are putting an IO-intensive application into production, a Hostini VPS with local NVMe storage already delivers consistent sub-1ms latency — which eliminates the largest source of variability in disk diagnostics.
Frequently asked questions
Does %util at 100% always mean the disk is the bottleneck?
No. On SSDs and NVMe drives, %util only reflects whether any request is in flight, not actual saturation. An NVMe can show 100% util while using only 5% of its real bandwidth. Trust await, IOPS and throughput before concluding saturation on modern media.
What is the difference between await and svctm in iostat?
await is the total time a request spends in the system (queue + service), in milliseconds. svctm tried to measure only the service time, but the calculation was incorrect on modern kernels and was removed in recent sysstat versions. Use await as the primary latency metric.
iotop shows zero IO but the system is slow — why?
iotop requires CONFIG_TASK_IO_ACCOUNTING enabled in the kernel. On containers and some custom kernels it shows up as zero. Confirm with cat /proc/self/io — if the file does not exist or is empty, accounting is disabled and you need to use pidstat -d or bpftrace.
Why is await high but %util low?
This usually points to a remote disk with network latency (NFS, iSCSI, cloud storage) or hypervisor throttling in virtualized environments. The local disk is not saturated, but each request takes long because it travels over the network. Check the storage type and network metrics.
iostat -x shows high rrqm/s and wrqm/s — is that bad?
No, it is the opposite. rrqm/s and wrqm/s are requests merged by the kernel before reaching the disk — it means the IO scheduler is combining sequential operations into larger, more efficient batches. High values here usually indicate a well-optimized sequential workload.
Can I run iotop in production without impact?
Yes, iotop overhead is low (~1-2% CPU on busy servers). Use iotop -o -b -n 5 -d 2 to capture 5 snapshots of 2 seconds in batch mode and exit — ideal for collecting evidence without leaving a process running indefinitely.