This is the mail archive of the
ecos-discuss@sources.redhat.com
mailing list for the eCos project.
RE: Network code unstable (Solved for real this time).
On Wed, 2002-03-06 at 18:38, Gary Thomas wrote:
> On Wed, 2002-03-06 at 10:11, Pieter Truter wrote:
> >
> > After a lot of testing and debugging I found out that the CS8900 is losing
> > interrupts under heavy network load. This is more prominent when running
> > from flash which is slower.
> >
> > Looking at if_cs8900a.c I think I found the cause of my problem. The time
> > between the interrupt and acknowledge() is too long. I then moved the
> > acknowledge() in cs8900a_deliver() to cs8900a_isr() just after the mask()
> > and now everything works great.
>
> So, this was a case of new interrupts from the device not causing the ISR
> to run, possibly because of edge triggering. I don't understand while the
> 'while()' loop in the interrupt handling routine doesn't cause this to be
> retriggered, but maybe it's just a chip problem.
>
> >
> > I am still concerned about masking the interrupt for so long but I
> > understand that this is probably done to be able to use the BSD stack with a
> > realtime OS.
>
> It's only the device interrupt which is masked. I don't see how you can
> avoid that - you've got to keep the device from [re]interrupting the driver
> while it handles the current one. Also note that the "deliver" function gets
> called from a network processing thread, not directly by the DSR code. This
> probably accounts for most of the delay.
>
> >
> > The big problem with losing an interrupt from the CS8900a chip is that you
> > have to cleanup all the info in the chip otherwise it would not generate any
> > other interrupts. And if you do not know that you missed an interrupt you
> > don't know when to cleanup. ;-(
>
> Every ethernet device seems to have these quirks and, sadly, we have to deal
> with them all, each in their own way :-(
>
Please correct me if I am wrong but I don't think Ethernet devices play
a special role in this kind of problem, it rather seems like a common
pattern in interrupt driven device drivers. Given the pseudo code below:
ISR/DSR:
mask_dev_interrupt()
wake_up_thread()
thread:
status = clear_device_status()
while (work_to_do(status)) {
...
status = clear_device_status()
}
ack_dev_interrupt() /* <== bad guy */
unmask_dev_interrupt()
the thread code will loose interrupts when new events happen on the
device while the thread has left the 'while' loop but not yet executed
the ack_dev_interrupt().
But now if we move 'ack_dev_interrupt()' either at the beginning of the
thread code or in the 'while' loop, before reading the device's status,
then it solves the problem.
Robin
--
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss