This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [SMP] direct/indirect printing via asr can cause lockups


This is some observation related to diag lock, made couple of months back.

In case there are N higher priority threads interested in printing via any of
the functions that take CYG_HAL_DIAG_LOCK, on N processor SMP configuration and
a lower priority thread printing some message via either of these functions
(cyg_test_output, cyg_assert_msg) is taken off in the middle of it's printing
(with diag lock taken), this would lead to system lockup.

if someone defines CYG_HAL_DIAG_LOCK and CYG_HAL_DIAG_UNLOCK in case of
non-SMP,
 it's obvious to see the lockup that could result  there very easily.

though noone is expected to do that in non-SMP, if purpose of this LOCK is to
maintain sanity of printing (avoid mixing of messages printed by many threads
on
different processors) via above mentioned functions.

> I suspect the simplest approach is to introduce a CPU id and counter
> alongside cyg_hal_smp_diag_lock and then implement a recursive lock
> just for this purpose. There is already code in the scheduler lock to
> do this, so this can be copied and adapted very easily. I don't think
this was first thing, i also had thought, but rejected. with that you would
save 
the lockup happening because of printing in asrs BUT as a side effect you will 
also print the messages from one or more threads mixed up with the messages
from 
 initial thread (before it's message is completely printed).

For example -

T1 wants to print "I am Thread T1\n"
T2 wants to print "I am thread t2\n"
T3 wants to print "hullo me, T3\n"

with original diag lock implementation (problems with high priority threads 
scenario and asr) you would get the messages from one printed completely before

message from another (if lockups don't result).

however with the modified diag lock, messages will get mixed up during printing

like (one of the scenario) -

"I am ThreadI am thread t2hullo me,  T1\nT3\n\n"

if you consider buffered printing to screen/debug channel, where buffer gets
flushed either when full or encounters a newline, then printing of part message
from one thread could get delayed for long.

if someone bothers about that delay, then solution out for that person is to
flush the corresponding buffer just before switching in unlock_inner, thus
increasing the switching latency. may be person would keep this adventure under
appropriate ifdefs.

what about a solution that gets rid of need for diag lock, by having as many
copies of hal_diag_buffer as no. of processors (of course you need to adjust
the sizes and some synchronisation aspects need to be bothered about), you need
to take care to flush the contents of all of them when you stop the test
execution in case of assert failures or otherwise.

mixup of messages will still be there, but at message segment level (as shown
in  example above) - not at character level (that would often happen, if diag
lock is removed and single buffer is used).

keeping small buffer (that gets flushed on newline/buffer-full/context-switch)
associated with thread is another idea to solve printing problem, but not good.

finding a way out for printing problem in SMP scenario, that works fine with
all the situations, is not possible w/o some compromises or others.

even if you go with #processors buffers, flushing the buffer at switch time
would improve the things for getting messages printed by different threads in
as much continuity and lesser message-segments mixups in output, but it can't
guarantee 100% clean printing (messages follow each other in order, each
message as one unit) - what if message is more than what can fit in
hal_diag_buffer at one go.

multiple buffers approach even takes care of mixup of messages at character
level, when different threads on different processors are printing using diag_*
family of functions.

issues of printing also remain when some application happens to use mix of
diag_* family of functions and cyg_* printing functions. this issue is also
taken care to reasonable extent using multiple buffers.

> there is any need to introduce a new spinlock type, or to change the
> definition of spinlocks as a whole.
my limited knowledge from early day goes like - spinlocks wait (effective
polling) is usually done for a short duration, on a processor, where otherwise
switching is unwieldy.

In the existing case, spinlock could be waited for non-deterministic long time
(depends on how long message the spinlock owner is printing and also how often
it gets switched out during that). spinlocks are associated with thread, and
spinlock-waiting thread could switch processors during it's wait. point of
saying is - spinning around could be for a really significant time.

IMO using spinlocks for implementing diag_lock is not right thing.

what about using some mutex for the purpose? with threads it would work fine,
but problems would come with asrs.

a thread already has taken this mutex and then before it releases it, some asrs
are processed that want to take that mutex. things would go for a toss, asrs
can't sleep on this mutex. a solution to handle this issue, should make things
fine with mutex solution.

sandeep

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]