This is the mail archive of the ecos-bugs@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug 1001456] HAL misses Interrupt Clear-Pending Registers handling:wasted processing power


Please do not reply to this email. Use the web interface provided at:
http://bugs.ecos.sourceware.org/show_bug.cgi?id=1001456

--- Comment #30 from Nick Garnett <nickg@ecoscentric.com> 2012-09-27 15:59:38 BST ---

I'm not at all happy about adding an extra set of HAL and kernel functions to
all architectures just to solve an obscure problem on a single architecture.
Either a better solution needs to be found that can be applied only to the
Cortex-M architecture, or we simply have to live with the consequences.

The proposed change is, in any case, clearly a misuse of the NVIC hardware. The
feature being used is intended to allow individual interrupts to be set pending
by software for testing. The clear register appears to be present mainly as a
side effect of using a common interface for all these NVIC bit masks. I'm sure
ARM do not expect interrupts to be cleared in this way under normal
circumstances, this should be done as a consequence of entering the ISR.

The timing diagram in comment #2 suggests that the real problem will only occur
if the CPU is too slow for the rate of interrupts being delivered.

A better version of that timing diagram might be as follows:

HW |  E1      E2
---|----------------------------
ISR|    I1      I2
---|----------------------------
DSR|         D1=  ====D2==

The ====== show the time during which the DSR is running. I2 runs during the
execution of D1, posting a second DSR call, which will run immediately after
D1, and in theory will find nothing to do.

I can see two situations in which this can happen.

1. The CPU is simply too slow to finish running D1 before E2/I2 run, even if D1
was started immediately after I1 completed. If the events are coming at this
rate continually, then the CPU simply won't keep up. If they come in infrequent
bursts, then the odd extra ISR/DSR is of little consequence, and is part of the
cost of dealing with a temporary overload.

2. The start of D1 was delayed because eCos had the scheduler locked when I1
ran. This is a consequence of the ISR/DSR model. If I2 ran before D1 started,
then the DSR would only be called once, with a larger count value. If I2 runs
after D1 starts, it may post a separate DSR; but this is true for all
architectures, not just this one.

Adding an interrupt cancel anywhere in D1 would only deal with any new events
that were posted before that point. E2 could occur just after the cancel, and
would still result in an extra ISR/DSR. The proposed solution can only reduce
the number of extra ISR/DSRs, never eliminate them entirely.

I also don't believe this is entirely an eCos problem. It is also present in
the Cortex-M nested interrupt model, and is the expected/intended behaviour.
Consider a system that is only using ISRs. Here's a timing diagram:

HW  |  E1   E2   E1
----|----------------------------
ISR1|    I1===      ===I1======
----|----------------------------
ISR2|         I2====

Here there are two devices, 1 and 2, with associated ISRs; ISR1 is lower
priority than ISR2. If ISR1 is running when device 2 raises an interrupt, then
it will be pre-empted and ISR2 will run. If ISR2 runs for long enough then it
may delay the completion of ISR1 until after a new device 1 interrupt is
posted. This will re-set the pending bit and immediately after ISR1 returns, it
will be re-entered. The same will happen in the absence of nested ISRs if ISR1
just takes too long to process the first event before the second occurs.

This is similar to the eCos situation. So long as these things occur
infrequently, then extra ISRs are simply a cost of handling bursts of
interrupts. If it happens frequently then that is an indication that the CPU is
too slow to keep up with the interrupt rate.


I wasn't sure what conclusion I would come to when I started writing this, but
I think I have convinced myself that this is actually a non-issue. The proposal
cannot eliminate these extra ISR/DSR calls completely; the problem is not eCos
specific; it is not Cortex-M specific either; the issue only seriously affects
systems that are on the edge of being too slow to cope with the interrupt rate.
The worst aspect of the proposal is that it spreads its tentacles into all
other architectures and device drivers.

However, comment #7 contains a seed of a better solution. Many device drivers
are somewhat lazy in using cyg_drv_interrupt_mask() and friends to control
interrupt delivery; and it is this that is the main cause of the problem. They
should really use peripheral registers to do this, where possible. Certainly
generic drivers like the 16x5x driver should. I switched the eCosCentric
version of this driver over to doing exactly this earlier this year and can
contribute a patch to do that for the public version. Other drivers should be
converted as and when convenient. Those devices that don't have local control
of interrupts will just have to continue with the current approach and accept
the consequences.

-- 
Configure bugmail: http://bugs.ecos.sourceware.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]