This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[SMP]serious bug in synchronisation primitives

From: sandeep <shimple0 at yahoo dot com>
To: ecos-devel at source dot redhat dot com
Cc: ecos-discuss at sources dot redhat dot com
Date: Wed, 27 Oct 2004 18:16:08 +0530
Subject: [ECOS] [SMP]serious bug in synchronisation primitives

While going through the execution logs of an assert build, found the messages complaining about 'Locking mutex I already own' , 'Unlock mutex I do not own'. On further analysis found the source of problems lying in situations like (explained wrt Cyg_Mutex::lock) --

self/get_current_thread gets value from current_thread[CYG_KERNEL_CPU_THIS()]
If the thread gets switched out in the middle of this indexing, it's mess.

Consider a thread is executing on processor 0 when it executes the mutex lock call. it has got the current CPU (index into array) when it's timeslice got over and it gets switched out. Next time it gets chance to run on processor 1 and continues from where it left off. The id it gets for self is not it's own, but of the thread running on processor 0.

Hope that repurcussions of this are clear w/o detailed explanations. On noticing this, scanned mutex.cxx and other sources in kernel/current/src/sync and found couple of more synchronisation primitives affected by this bug in a quick scan.

The crux of problem (whatever little i can see into it at the moment) is - accesses to arrays via CYG_KERNEL_CPU_THIS() should be done under scheduler_lock taken.

Doing it under sched_lock might be a costly affair in some places (?? need to check out ??), may be interrupts-disabled for the moment could be used in those situations.

Various asserts/tests/normal code needs to be checked for direct/indirect accesses to current_thread, need_reschedule, thread_switches (variable that i directly see in sched.hxx) outside scheduler_lock.

I hope with the help from list a thorough scan (earlier thorough scan for direct/indirect use of get_sched_lock is still pending :( ) can be run to find instances of the problem.


mutex.cxx
---------
cyg_bool Cyg_Mutex::lock(void)
{
    CYG_REPORT_FUNCTYPE("returning %d");

   cyg_bool result = true;
   Cyg_Thread *self = Cyg_Thread::self();

   // Prevent preemption
   Cyg_Scheduler::lock();
...
}

same situation also appears in --
cyg_bool Cyg_Condition_Variable::wait_inner( Cyg_Mutex *mx )
cyg_bool Cyg_Condition_Variable::wait_inner( Cyg_Mutex *mx, cyg_tick_count timeout )

cnt_sem2.cxx
------------
cyg_bool Cyg_Counting_Semaphore2::wait()
cyg_bool Cyg_Counting_Semaphore2::wait( cyg_tick_count abs_timeout )

cnt_sem.cxx
-----------
cyg_bool Cyg_Counting_Semaphore::wait()
cyg_bool Cyg_Counting_Semaphore::wait( cyg_tick_count timeout )

bin_sem.cxx
-----------
cyg_bool Cyg_Binary_Semaphore::wait()
cyg_bool Cyg_Binary_Semaphore::wait( cyg_tick_count timeout )

--
sandeep
--------------------------------------------------------------------------
I have discovered the art of deceiving diplomats. I tell them the truth
and they never believe me.
		-- Camillo Di Cavour
--------------------------------------------------------------------------


--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

Follow-Ups:
- Re: [SMP]serious bug in synchronisation primitives
  - From: sandeep

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]