This is the mail archive of the
ecos-discuss@sources.redhat.com
mailing list for the eCos project.
[SMP]serious bug in synchronisation primitives
- From: sandeep <shimple0 at yahoo dot com>
- To: ecos-devel at source dot redhat dot com
- Cc: ecos-discuss at sources dot redhat dot com
- Date: Wed, 27 Oct 2004 18:16:08 +0530
- Subject: [ECOS] [SMP]serious bug in synchronisation primitives
While going through the execution logs of an assert build, found the messages
complaining about 'Locking mutex I already own' , 'Unlock mutex I do not own'.
On further analysis found the source of problems lying in situations like
(explained wrt Cyg_Mutex::lock) --
self/get_current_thread gets value from current_thread[CYG_KERNEL_CPU_THIS()]
If the thread gets switched out in the middle of this indexing, it's mess.
Consider a thread is executing on processor 0 when it executes the mutex lock
call. it has got the current CPU (index into array) when it's timeslice got over
and it gets switched out. Next time it gets chance to run on processor 1 and
continues from where it left off. The id it gets for self is not it's own, but
of the thread running on processor 0.
Hope that repurcussions of this are clear w/o detailed explanations. On noticing
this, scanned mutex.cxx and other sources in kernel/current/src/sync and found
couple of more synchronisation primitives affected by this bug in a quick scan.
The crux of problem (whatever little i can see into it at the moment) is -
accesses to arrays via CYG_KERNEL_CPU_THIS() should be done under scheduler_lock
taken.
Doing it under sched_lock might be a costly affair in some places (?? need to
check out ??), may be interrupts-disabled for the moment could be used in those
situations.
Various asserts/tests/normal code needs to be checked for direct/indirect
accesses to current_thread, need_reschedule, thread_switches (variable that i
directly see in sched.hxx) outside scheduler_lock.
I hope with the help from list a thorough scan (earlier thorough scan for
direct/indirect use of get_sched_lock is still pending :( ) can be run to find
instances of the problem.
mutex.cxx
---------
cyg_bool Cyg_Mutex::lock(void)
{
CYG_REPORT_FUNCTYPE("returning %d");
cyg_bool result = true;
Cyg_Thread *self = Cyg_Thread::self();
// Prevent preemption
Cyg_Scheduler::lock();
...
}
same situation also appears in --
cyg_bool Cyg_Condition_Variable::wait_inner( Cyg_Mutex *mx )
cyg_bool Cyg_Condition_Variable::wait_inner( Cyg_Mutex *mx, cyg_tick_count timeout )
cnt_sem2.cxx
------------
cyg_bool Cyg_Counting_Semaphore2::wait()
cyg_bool Cyg_Counting_Semaphore2::wait( cyg_tick_count abs_timeout )
cnt_sem.cxx
-----------
cyg_bool Cyg_Counting_Semaphore::wait()
cyg_bool Cyg_Counting_Semaphore::wait( cyg_tick_count timeout )
bin_sem.cxx
-----------
cyg_bool Cyg_Binary_Semaphore::wait()
cyg_bool Cyg_Binary_Semaphore::wait( cyg_tick_count timeout )
--
sandeep
--------------------------------------------------------------------------
I have discovered the art of deceiving diplomats. I tell them the truth
and they never believe me.
-- Camillo Di Cavour
--------------------------------------------------------------------------
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss