This is the mail archive of the ecos-discuss@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Re: ISR not causing an DSR in some rare conditions


Hi,

There was a discussion on the developers list last week or the week before
about a subtle race condition that arose when an interrupt occurred at the
exact instant that a single thread called a blocking primitive and the
scheduler was busy transitioning to the idle thread. See
http://ecos.sourceware.org/ml/ecos-devel/2006-01/msg00000.html for more
details. Nick posted a patch, which I presume was applied to the CVS tree.
You could try applying his patch yourself or updating to the latest CVS tree
(and seeing if the tree does, in fact, include his patch).

It is unlikely that this is the problem. The bug I fixed was a failure to call DSRs during the initial context switch to a newly created thread. In any case that only delayed the DSR until the next scheduler unlock, it didn't lose the DSR entirely. The program that showed the problem was somewhat unusual in that it had nothing else to do until the DSR ran.

As for the reported problem. I cannot think of anything that might be
causing a DSR to be lost entirely. The code dealing with all of this
has been thoroughly exercised over many years and has been the subject
of much scrutiny. I'm as certain as anyone can be that it is
correct. If there were a race condition anywhere in here then I would
expect it to have manifested itself elsewhere before now.


Actually I can think of one reason why races may be introduced unexpectedly. This is if the compiler is reordering instructions incorrectly and moving things across barriers that it should not. In particular if it is not honouring the volatile nature of the asm inlines that enable and disable interrupts.

I don't know what version of the compiler you are using, but it might
be instructive to see if a different version exhibits different
behaviour. However, we have never seen any problems like this, so I am
really clutching at straws here.

I'm using a self-compiled gcc 3.4.3 for xscale. I'll try to compile a different version to check if this helps. I'll also try to setup a test system which should trigger this problem faster (not 24 hours) to do some more investigation.


I also noticed while searching for the isr to dsr delay problem that the scheduler lock count sometimes raises quite high (up to 10), but i don't have nested interrupts enabled.

Bye...


-- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]