What are the best ways to do close to real-time tasks on a non real-time OS/kernel?

断了今生、忘了曾经 提交于 2019-12-04 19:23:29


On a GNU/Linux machine, if one wants to do "real-time"(sub millisecond time critical) tasks, you almost invariably have to go through the lengthy, complex, and problem prone process of patching the kernel to expose adequate support[1] [2].

The biggest problem is, many systems where real-time tasking is most useful do not have the fundamental hardware requirements to even allow these patches to work, namely a high resolution timer peripheral. Or if they do, it is specific to the hardware, so as such needs to be specifically implemented in the patch on a case by case basis. This is true even if the CPU/instruction clock rate is more than fast enough to give the required time granularity and then some.

So, my question is, what are some of the best second place ways/tricks to get as close as possible to the above real-time goal? Things that one can simply do in the applications source code, without intimate knowledge of underlying hardware or too much "kernel hacking".

Elevating process priority, starting an extra thread for "critical" tasks, and (in C) using variants of nanosleep() are the best looking answers/tricks I have come up with so far. I hope to find more.


The sched_setscheduler(2) and friends allow you to use two different soft real-time schedulers, SCHED_FIFO SCHED_RR. Processes running under these schedulers are prioritised higher than regular processes. So as long as you only have a few of theses processes, and control the priorities between them, you can actually get pretty descent real-time responses.

As requested in a comment, here is the difference between SCHED_FIFO and SCHED_RR:

With the "real-time" schedulers, there are up to 100 different priorities (POSIX only requires 32 distinct levels, so one should use sched_get_priority_min(2) and sched_get_priority_max(2) to get the actual number. The schedulers both work by preempting processes and threads with lower priority, the difference is in how they handle tasks with the same priority.

SCHED_FIFO, is a first in first out scheduler (hence the name). This means that the the task that hits the run queue first, is allowed to run until it is done, voluntarily gives up its space on the run queue, or is preempted by a higher priority task.

SCHED_RR, is a round robin scheduler. This means that tasks with the same priority are only allowed to run for a certain time quantum. If the task is still running when this time quantum runs out the task is preempted, and the next task in the run queue (with same priority) is allowed to run for up to its time quantum. As with SCHED_FIFO, higher priority tasks preempt lower priority ones, how ever, when a task which was preempted by a higher priority task is allowed to run again, then it's only allowed to run for the time left in its quantum. See the Noes-section in sched_rr_get_interval(2) for how to set the time quantum for a task.



Sub-millisecond is going to be hard to guarantee on a non-RT kernel. I know a lot of very good work has taken place over recent years (e.g. the big kernel lock has gone), but that's still not enough to guarantee it.

You could take a look at Scientific Linux from those friendly atom-botherers at CERN and Fermilab. That can have MRG installed (see my link), which gives you a pre-pack setup of the PREEMPT_RT patch.

Or if you've got the money you could get Redhat MRG. That's a fully supported Linux distro with the PREEMPT-RT patch built in, so that would do away with the problem prone patching of the kernel.

Thing is, Redhat charge a lot for it ($3000 PER YEAR PER INSTALLATION). I think they've tumbled that one of the biggest customers for it is the high speed trading investors who have still gots $lots-and-lots and so won't notice $3000/box/year going out the door.

How I Got On with MRG

I've done a fair bit of work with MRG (using both of the above), and it is pretty good. It replaces the interrupt service routines in the stock kernel with threads to service the interrupt. That means that you can run your software at priorities higher than the IRQ threads! That's the sort of thing you have to do if you want to get close to guaranteeing sub-millisecond latency on your application.

There seems to be a gradual drift of MRG things into the mainline kernel, which is a good thing in my opinion. Maybe one day it will become the mainline thing.

Other Gotchas

Modern CPU thermal management can be a real pain in the neck. I've had systems which lock up for 0.3 seconds whilst a System Management Interrupt was being serviced (by the bleedin' BIOS, not the OS), just because the CPU's warmed up a little bit. See this. So you have to be wary of what your underlying hardware does. Generally you have to start worry about ditching the managed cooling of modern PCs and go back to a big fan spinning fast all the time.


You can get quite far with Linux by removing the 'disturbance' from other processes the to the realtime process. I played with the same thing in Windows, which is a much larger horror to get right, but it shows the direction. So a kind of check-list:

  • Most important (strange but true): the hardware. Don't go for a laptop, this will be optimized to do strange things during SMM interrupts. Nothing you can do.
  • The drivers: Linux (and Windows) has bad drivers and good drivers. Related to hardware. And there is only one way to find out: benchmarking.

Isolate from rest of system, disable all sharing:

  • Isolate one CPU (man cpuset). Create two CPU sets, one for normal processes, and one for your realtime process.
  • Reduce realtime part of your code to the minimum. Communicate with large buffer with other parts of the system. Reduce IO to bare mimimum (since IO has bad guarantees).
  • Make the process have the highest (soft) realtime priority.
  • Disable HyperThreading (you don't want to share)
  • pre-allocate the memory you need, and mlock() the memory.
  • Isolate the devices you use. Start by allocating a dedicated IRQ to the device (move the other devices to another IRQ, or remove other devices/drivers).
  • Isolate the IO you use.

Reduce activity of rest of system:

  • only start processes you really really need.
  • remove hardware you don't need like disks and other hardware.
  • disable swapping.
  • don't use Linux kernel modules or load them up front. The init of modules is unpredictable.
  • preferably remove the user also :)

Make it stable and reproducable:

  • disable all energy savings. You want the same performance all of the time.
  • review all BIOS settings, and remove all 'eventing' and 'sharing' from them. So no fancy speedsteps, thermal management etc. Choose low latency, don't choose things with 'burst' in the name since that generally trades throughput for worse performance.
  • review Linux driver settings, and lower latencies (if applicable).
  • use a recent kernel which tries to look like a realtime kernel each day somewhat more.

And then benchmark, using stress testing and leaving the machine on for days while recording max. latencies.

So: good luck :)


The biggest problem is, many systems where real-time tasking is most useful do not have the fundamental hardware requirements to even allow these patches to work, namely a high resolution timer peripheral.

I strongly disagree: The biggest problem is that you might be blocked or pre-empted for an arbitrary amount of time with no warning. It hardly matters if you can sleep with 1us accuracy if you might occasionally go to sleep for 500ms. Realtime computing is about guaranteed worst case times, not precision sleep intervals. If you want to program an I2C EEPROM you could benefit from a high-resolution timer that would let you meet the setup/hold times as closely as possible without wasting any time. An occasional random delay of 500ms wouldn't matter because the EEPROM would just sit there waiting. That's not a realtime application, though. If you are implementing a control loop with a 1us update to drive a servo, that 500ms delay will cause a huge position disruption while the system runs uncontrolled.

You can't do anything in your application to work around the fact that your disk driver may spend hundreds of milliseconds processing IO completions in an interrupt context. The patches that make the driver friendlier to RT apps are what make an RT kernel.