Steal time is the percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor. As such, it only happens in virtualized environments like AWS, GCP, Azure, vSphere, and Xen.
To see the steal time in Linux, run top on the command line and look for %st. Seeing the steal time on Windows depends on the hypervisor but usually requires installation of the guest additions package for that particular hypervisor.
Note for AWS: Amazon has a concept of CPU credits for certain instance types. You earn CPU credits every hour and use them as the VM requests CPU time. Once the VM’s credits are depleted, CPU will be stolen until more credits are earned. You can view CPU credits per VM on the AWS web console.
The impact of stolen CPU always manifests in slowness but can have more profound effects on your infrastructure. Here are some examples:
There are two possible causes of steal time:
Under no circumstances should you tolerate high steal time on a server. It means you’re getting worse performance than what you’re paying for. Moving and upgrading servers is quick and painless and solves the problem at its root.
Manually terminate the virtual machine and launch a replacement.
If money is no object, then upgrading the VM is the easiest guaranteed solution.
Otherwise, finding the cause is best done through trial and error. Terminate the VM and relaunching it will move it to another physical server. If steal time persists through multiple moves, then it’s time to upgrade the VM to have more CPU.
An automated solution where high steal time kicks off a relaunch can be effective but can also mask scaling issues.