Limiting the CPU Usage of a Process Johan Bergs (
[email protected]) Peter Hellinckx, Tom Dhaene, Gunther Stuer, Jan Broeckhove Department of Mathematics and Computer Science, Middelheimlaan 1 2020 Antwerp, Belgium.
Introduction In this paper, we propose an algorithm that makes it possible to limit the amount of CPU a process may use. The primary use of CPU limiting is when one wants to allow background jobs to execute on desktop computers. This is typically the case in lightweight grid systems such as JGrid[JGrid Team] and H2O[Emory University]. By limiting the amount of CPU a background calculation may use, one can make sure that it does not disturb other processes. An example of a system that is already capable of doing this, is the Condor System [Condor Team].
predetermined maximum amount of CPU, if that maximum amount is available, without interfering with other processes. The worker thread should only receive less CPU than requested, if other processes would otherwise be interrupted. So, if the worker thread isn’t the only thread using the CPU, an adaptive algorithm is needed.
The Algorithm The technique we describe here has been developed in [Bergs, 2004]. It is based on the general description of the CPU limiting algorithm of the Folding@Home project, developed by the Pande group at Stanford University.
Basic Algorithm For the remainder of this text, we will assume that the actual background process runs in a thread. This thread is called the “worker thread”. Also, further on, we will assume that the background job runs at the lowest possible priority. This guarantees that normal user processes are never affected by the background process, since the operating system only schedules low priority jobs if they have no effect on other active jobs of a higher priority class that require the CPU. If one wants the worker thread to use a maximum of x% of the CPU, one can execute it during (x/100) * T ms, followed by a pause of (1 – x/100) * T ms. This algorithm is continuously repeated, with T a (fixed) number of milliseconds When implemented as described above, this approach is only valid if the worker thread is the only thread that uses the CPU, and does not work at all if other processes are running concurrently with the background process. This is illustrated by figure 1, which shows the CPU usage of one of three identical background processes that were run simultaneously, each restricted to using only 30% of the CPU. No other user processes were running. Nevertheless, the algorithm fails to allocate the required CPU time to each process, because it does not take into account that other processes might interrupt the worker thread. However, one of the purposes of the algorithm is to allow the background process to use the
Figure 1: Performance of the basic algorithm.
Adaptive Algorithm In the adaptive algorithm, the work/sleep ratio of the worker thread is changed dynamically in order to keep the CPU usage of the thread as close to the requested value as possible. To do this, one needs to periodically query the operating system for the effective CPU usage of the worker thread and adapt the work/sleep ratio as needed. This can best be done by a “monitoring thread”, that monitors the CPU at fixed time intervals. It then becomes possible to compare the actual amount of CPU the background process has used since the previous monitoring, to the amount of CPU it was allowed to use. If these values do not match, the work/sleep ratio can be corrected accordingly. While this is an improvement over the basic algorithm, this algorithm also needs some improvements. The scheduling mechanism used by the operating system results in fluctuations of the average CPU usage. Hence, it is infeasible to change the work/sleep ratio every time the real CPU usage is not exactly x%. Moreover, if one should do this after all, one actually worsens the fluctuations observed in the CPU usage, instead of stabilizing them.
The solution to this problem is to allow a certain degree of tolerance by changing the work/sleep ratio only if the effective CPU usage falls outside of the tolerance range.
Tolerance The tolerance range can be defined in different ways. It can be symmetrical, allowing a process to temporarily use a few percent more or less than the preset value. Another, more rigorous approach, is to define the preset CPU percentage level as an absolute boundary, allowing the process to use a few percent less CPU than requested, but no more. A third possible method is to define different upper and lower bounds of tolerance, e.g. allowing the process to temporarily use 2% more, and 5% less CPU than requested. The first method results in a more stable CPU usage, requiring less corrections to be made by he algorithm. The second method also works, but results in a more fluctuating CPU usage. The behavior of the third method, finally, depends on the percentage levels defined, but generally also results in a stable CPU usage. Also, the way in which the work/sleep ratio is changed, is of importance. For example, if the preset CPU usage is defined to be 50%, and the monitoring thread notices that, during the last time interval, only 45% of the CPU was used, there are different ways to minimize this difference. The first possible way is to correct the work/sleep ratio such that during the next time interval, 5% more time is allocated to work, and 5% less time to sleep. This method, however, again results in a fairly fluctuating CPU usage behavior. Another, more suitable method, is to incrementally lower or increase the amount of time spent working, until the CPU usage again falls within the preset tolerance range. Doing so, the fluctuations observed in the CPU usage are minimized. A third way is to combine the above two methods, by incrementally increasing the work time of the worker thread, should it fall below the lower boundary, but to rigorously lower the work time should it lie above the upper bound. Figure 2 shows the algorithm at work using a symmetric tolerance of 5% and incremental increase/decrease of the work time. The process was restricted to use only 75% of the CPU. Figure 3 shows what the CPU usage looks like for the same process, but now using a non-symmetrical tolerance (the process wasn’t allowed to use more than 75% of the CPU), and used an incremental increase/rigorously decrease method. It is clear that the first graph shows a more stable CPU usage. It does, however, also result in a CPU usage that mostly lies slightly higher than the preset 75%. In the second graph, the CPU usage fluctuates more, but because of the absolute maximum of 75%, the algorithm keeps the CPU usage much closer to the preset value. The symmetrical version of the algorithm can be made to stay closer to the preset value by choosing the tolerance level lower, e.g. one or two percent.
Figure 2: The adaptive algorithm – symmetrical case.
Important Parameters The most important parameter of this algorithm is T, the length of the monitoring interval. If T is too short (