Linux CPU Scheduling


Today we will discuss about the CPU scheduling in Linux. What do we mean by CPU scheduling, it is sharing the CPU among different processes. Earlier most of the system were uniprocessor and for them CPU scheduling algorithm used were:-

  1. FCFS (First come first scheduled)
  2. SJFS (Shortest Job First)
  3. RR (Round Robin)
  4. Priority Based Scheduling

Nowadays systems are multiprocessor and we need to schedule the CPU in such a way that all the processor execute jobs in balanced way and for this we have new scheduling algorithms. Besides this most of the new scheduling algorithm are priority based.

Now, when we consider the priority based algorithm, then first question arise is: who will assign the priority? we cannot leave this to user; since it may possible that the user processes will have higher priority then the system related processes which is not correct since the interrupt and kernal thread should have higher priority and we also cannot leave the priority decision totally on the system since then it will be impossible for user to specify to the system about importance of tasks. In Linux there are two types of priorities:-

  • Static Priority- assigned by the user using nice() system call.
  • Dynamic Priority – priority assigned earlier is recalculated by the scheduler.

In, Linux processes are classified as:-

  • CPU bound processes (Real Time (RT) processes)
  • IO bound processes (Interactive processes)

Scheduler in Linux system is known as O(1) scheduler, since it takes constant time in all the operations e.g. selecting the process with highest priority, recalculating the priorities and adding the process to queue.It is multi-level priority based, fairer and preemptive(once the time quantum is over it will be switched with the process higher or equal in priority and time quantum assigned to processes is nearly ~100 ms) in nature. There are 140 levels of priority and at each level there is queue for processes.For deciding, the processes with highest priority it uses bitmap mechanism and scan this bitmap array from left and so which ever bit is found to be set first, process will be selected from that particular priority level.
In Linux, typically there are two priority arrays (as discussed earlier the array of 140 priority queue) are maintained, which are known as active and expired. For, processes for whom time quantum is over, is moved to expired array and once the processes are exhausted in active array, pointers are exchanged that is expired array become active and active become expired and all processes are assigned new time quantum (a.k.a epoch).

While recalculating the priority, scheduler gives bonus to processes which varies from 0 to 10, one thing
to be remembered is that higher the value of the priority less is its priority in scheduler term, hence to increase the priority, bonus added which actually means decrementing that many bonus from the current priority and opposite way while decreasing the priority in scheduler context. To make the system fairer, IO bound processes get more bonus then the CPU bound and this bonus calculation is depend on the amount processes have slept and priorities for RT processes remains same. Algorithm for recalculation of priority of process is:

(For detail, data structure and methods of Linux check kernel/sched.c and kernel/sched.h file and man sched_setscheduler.)

bonus = CURRENT_BONUS(p) – MAX_BONUS / 2;
prio = p->static_prio – bonus;
CURRENT_BONUS is defined as follows:

Essentially, CURRENT_BONUS maps a task’s sleep average onto the range 0-MAX_BONUS, which is 0-10. If a task has a high sleep_avg, the value returned by CUR-RENT_BONUS will be high, and vice-versa. Since MAX_BONUS is twice as large as a task’s priority is allowed to rise or fall (MAX_BONUS of 10 means that the priority adjustment can be from +5 to -5), it is divided by two and that value is subtracted from CURRENT_BONUS(p).

Here, p-> static_prio is the static priority defined by the user to a process using nice() system call and by default it is 0.

Till now we have discussed about the priority assignment and its recalculation and data structure and algo for processes priority, but for a scheduler to be fairer to all processor it must need to balance the jobs load among the processors. For this load balancing for SMP system, Linux scheduler invokes the rebalance_tick() function, which is called by scheduler_tick(). rebalance_tick() first updates the current CPU’s load variable, then goes up the CPU’s scheduler domain hierarchy attempting to rebalance.

We can also specify different types of scheduling policies and details for this can be found, if we do “man sched_setscheduler”.
In summary, Linux CPU scheduler is one of the best multi-processor and fairer scheduler available today.


Synchronization using Monitor



It is an advance mechanism for synchronization, it is more sophisticated and advance than “Semaphore”. It can be considered as an object or module where the variables and synchronization mechanism is abstracted from the user hence unlike semaphore where user is suppose to handle the mutual exclusion and counting based logic, hence less error-prone.

It has been invented by C. A. R. Hoare, a british computer scientist who is profoundly known for inventing Quick Sort.

At a time only one thread can execute monitor’s procedure and only one of the procedure of monitor can be executed. From implementation point of view we can use binary semaphore to implement this feature of monitor .i.e. to provide the mutual exclusion among the monitor’s procedures.

Now, since only one of thread can execute the monitor hence other thread should wait for it and for waiting we need to first have the queue and then method call to make them wait () and release them from the waiting queue and having this in mind a new concept of Condition Variables is introduced. The POSIX thread implementation has library for this pthread_cond_wait() and pthread_cond_signal().

Though Condition variables’s wait() and signal() and Semaphore’s P() and V() seems similar but there are some differences:-

  • Wait Queue is FIFO (hence it is considered as fair also), unlike the semaphore queue, hence process will be awake in order
  • Unlike V() of semaphore, if there is no process waiting then signal() call will be nop (no operation)
  • Condition variable can only be used for block and wake up, it cannot be used for counting
  • Executing wait(), release mutual exclusion on the monitor i.e. allow other thread to execute it

Some caveats while implementing monitor:-

  • We should call signal() as the last instruction otherwise two threads will be executing the monitor at the same time which is not what we are expecting, those two threads will be: the thread which called signal() and the one which is waked up.
  • If we are calling signal in the middle then use the concept of urgent queue which is again like wait queue. The thread which is calling signal() will be inserted in urgent queue and if there is no process waiting for that signal then threads waiting in the urgent queue will be waked up and it has given priority over threads on the entrance queue (There is supposed to be an entry queue for the thread willing to get in monitor)

In the Java language, each object may be used as a monitor. (However, methods that require mutual exclusion must be explicitly marked as synchronized.) Rather than having explicit condition variables, each monitor (i.e. object) is equipped with a single wait queue, in addition to its entrance queue. All waiting is done on this single wait queue and all notify and notify all operations apply to this queue.

This approach has also been adopted in other languages such as C#.


Hello world!


Hi Guys,

I am Aakash, currently a graduate student at Temple University, Philadelphia before joining Temple I worked as Senior Quality Eng. in Yahoo! R&D for 3 and half years. I am also an alumni of Vellore Institute of Technology, Vellore, India.
I believe in working hard and doing something new in my style. I am fond of computers and love solving problems, optimizing it and my area of interests are: Programming Technique. Algos, Operating System and Distributed Systems.
As far as my hobbies and interests are concerned I like coding, reading books, listening songs and watching Cricket and Football.

I will welcome readers to be my friend at FB ( and will appreciate their comments.

Thanks for ur precious time!