Chapter 8. eCos Interrupt Model

This chapter describes the eCos interrupt model in detail.

Interrupt handling is an important part of most real-time systems. Timely handling of interrupt sources is important. This can be severely impacted by certain activities that must be considered atomic (i.e. uninterruptible). Typically these activities are executed with interrupts disabled. In order to keep such activities to a minimum and allow for the smallest possible interrupt latencies, eCos uses a split interrupt handling scheme. In this scheme, interrupt handling is separated into two parts. The first part is known as the Interrupt Service Routine or ISR. The second part is the Deferred Service Routine or DSR. This separation explicitly allows for the DSRs to be run with interrupts enabled, thus allowing other potentially higher priority interrupts to occur and be processed while processing a lower priority interrupt.

In order for this model to work, the ISR should run quickly. If the service requirements for the interrupt are small, the interrupt can be completely handled by the ISR and no DSR is required. However, if servicing the interrupt is more complex, a DSR should be used. The DSR will be run at some later time, at the point when thread scheduling is allowed. Postponing the execution of DSRs until this time allows for simple synchronization methods to be used by the kernel.

Further, this controlled calling — when thread scheduling is allowed — means that DSRs can interact with the kernel, for example by signalling that an asynchronous operation has completed.

In order to allow DSRs to run with interrupts enabled, the ISR for a particular interrupt source (or the hardware) must arrange that that interrupt will not recur until the DSR has completed. In some cases, this is how the hardware works. Once an interrupt is delivered another interrupt will not occur until re-enabled. In the general case, however, it is up to the ISR to enforce this behavior. Typically the ISR will "mask" the interrupt source, thus preventing its recurrence. The DSR will then unmask the interrupt when it has been serviced thus allowing new occurrences of the interrupt to be delivered when they happen.

Alternatively, if an ISR is doing very little per interrupt, for example transferring one byte from memory to an IO device, it may only be necessary to interact with the rest of the system when a "transfer" is complete. In such a case an ISR could execute many times and only when it reaches the end of a buffer does it need to request execution of its DSR.

If the interrupt source is "bursty", it may be OK for several interrupts and calls to the ISR to occur before a requested DSR has been executed; the kernel maintains counts for posted DSRs, and in such a case the DSR will eventually be called with a parameter that tells it how many ISRs requested that the DSR be called. Care is needed to get the interrupt code right for such a situation, for one call to the DSR is required to do the work of several.

As mentioned above, the DSR will execute at some later time. Depending on the state of the system, it may be executed at a much later time. There are periods during certain kernel operations where thread scheduling is disabled, and hence DSRs are not allowed to operate. These periods have been purposefully made as limited as possible in the eCos kernel, but they still exist. In addition, user threads have the ability to suspend scheduling as well, thus affecting the possible DSR execution latency. If a DSR cannot be executed sufficiently quickly, the interrupt source may actually overrun. This would be considered a system failure.

One of the problems system designers face is how much stack space to allow each thread in the system. eCos does not dictate the size of thread stacks, it is left to the user when the thread is created. The size of the stack depends on the thread requirements as well as some fixed overhead required by the system. In this case, the overhead is enough stack space to hold a complete thread state (the actual amount depends on the CPU architecture). Guidelines for the minimum stack requirements are provided by the HAL using the symbol CYGNUM_HAL_STACK_SIZE_MINIMUM.

A potential problem with this scheme is with nested interrupts. Since interrupts are reenabled during the DSR portion of servicing an interrupt, there is the possibility of a new interrupt (hopefully from a separate source) arriving while this processing takes place. When this new interrupt is serviced some state information about the interrupted processing will be saved on the stack. The amount of this information again depends on the CPU architecture and in some cases it is substantial. This implies that any given stack would need enough space to potentially hold "N" interrupt frames. In a realtime system with many threads this is an untenable situation. To solve this problem, eCos allows for a separate interrupt stack to be used while processing interrupts. This stack needs to be large enough to support "N" nested interrupts, but each individual thread stack only needs the overhead of a single interrupt state. This is because the thread state is kept on the thread"s own stack, including information about any interrupt that caused the thread to be scheduled. This is a much better situation in the end, however, since only the interrupt stack need be large enough to handle the potential interrupt servicing needs.

eCos allows for the use of the interrupt stack to be totally configurable. The user can elect to not use a separate interrupt stack. This requires making all thread stacks large enough but does reduce the overhead of switching stacks while processing interrupts. On the other hand, if memory is tight, then choosing a separate interrupt stack would be warranted at the cost of a few machine cycles during the processing of each interrupt.

Not all target HALs support this feature from day one anyway; however common configuration features such as this may still be presented in the config tool, and present in include files, even if the actual target selected does not support the feature at this time.

The following problem with the interrupt system has been observed. On the mn10300 simulator, interrupts were occurring immediately after they were re-enabled in the DSR. This should really be considered a case of interrupt overrun since there is no possibility of useful [or any] processing between the time an interrupt has been serviced and an subsequent interrupt occurs, hence the system is totally saturated. The problem came about because the stack was overflowing. It was a user [thread] stack that overflowed because DSR processing was taking place on the thread stack. Analysis of this problem led to a rework of how interrupts are processed, in particular the use of a separate interrupt stack during interrupt processing (both ISR and DSR parts). The overflow can still happen, but now it is restricted to only the interrupt stack. The system designer can make accommodations for this by making a suitably large interrupt stack if it is known that the "overrun" is finite, e.g. in the case of a serial device, this could be the depth of some FIFO. In any case, overrun should be avoided, but having only a single stack that needs to suffer multiple interrupt frames allows for this failure to be detected simply.

Of course, it is only worthwhile having a separate interrupt stack if you are using an eCos configuration that has a scheduler and multiple threads. If there is no kernel, then the C library arranges to call main(), or your application may be entered from cyg_user_start(), on the startup stack. It runs on the only stack there is in the system. Depending on the design of the particular HAL for your target platform, it is natural to re-use the startup stack as the interrupt stack as soon as the scheduler is running. Since this is only sensible if there is a kernel, HALs typically only implement the separate interrupt stack if the kernel is present.