This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [SMP]serious bug in synchronisation primitives


The DESTRUCT and BREAK wake reasons are explicitly intended for cases
where the mutex will not be locked. This is why they set the result to
is there any list of tests (if any) in ecos cvs that do not intend to
lock mutex directly/indirectly?

false. If you need to put an assert at the end of the lock routine,
you need to test something like (result && self == owner).
sure. it will help save time in not breaking head over the tests where
ending up with result as false is intended behaviour.
and for other cases (self == owner) should work just fine and help catch unintended cases.


It is unclear to me at present why that thread should be seeing a
BREAK at that point, however.  It is possible that there is an SMP
race condition somewhere in the POSIX code that handles thread
cancellation. However, a quick glance through it hasn't brought
anything to light.
With luck and reverse tracing on who sets wakeup_reason, this was pretty
quick. verified things happening in direction by examining the state on
all 4 processors when (self == owner) failed at the end of mutex lock.

here is the finding with compat-posix-tm_basic test -

test3 -> pthread_testcancel -> pthread_exit
-> pthread_setcancelstate -> {mutex lock, later mutex unlock}
** assert during mutex unlock **

run_thread_tests (
 .....
 create test threads with test3 as entry
 cyg_thread_delay(1);
 wait_for_tick();
 pthread_cancel these test threads
 .....
 ) --> pthread_cancel
--> Cyg_Thread::release (
  ......
  sleep_reason <-- {WAIT/TIMEOUT/DELAY}
  => wake_reason = BREAK;
  .....
 )

if the thread being pthread_cancel-led is sleeping inside mutex-lock while
loop, because mutex (pthread_mutex) is owned by someone else. when it wakes
up and continues it comes out of loop with result as false, so doesn't set the
owner to self and returns to caller (pthread_setcancelstate). later it goes
for mutex-unlocking and is caught by asserts there

- "Unlock mutex that is not locked" -- if nobody owns the mutext at the moment
- "Unlock mutex I do not own" -- if someone took the mutex in meanwhile or
  earlier owner hasn't yet released the mutex.

when above mentioned races come into picture in non-assert code, more wrong
things happen, as unlocking will set (owner,locked) to (NULL,false)
irrespective of it actually owning the mutex or not - and continue, thus synchronisation going for a toss as obvious.


----------------------------------------------------------------------------
another scenario comes via following route :

timer_test -> pthread_testcancel -> pthread_exit
-> pthread_setcancelstate -> mutex unlock -> cyg_assert_fail
assert message : "Unlock mutex that is not locked"

run_timer_tests
--> pthread_cancel (on test threads with timer_test as entry) --> ...

-----------------------------------------------------------------------------

may be things affect operation of other synchronisation primitives as well? the race conditions are at times quite elusive.

the comment above release (thread.cxx) says --
// Force thread to wake up from a sleep with a wake_reason of
// BREAK. It is the responsibility of the woken thread to detect
// the release() and do the right thing.

i guess, above points at situations intended by application/test writer.
test thread can go to sleep and wakeup as a result of it's executing certain kernel API functions and so on.


Was wondering (haven't looked much into it) if some of the found races could affect no-SMP case also??

I hope smp bugs would be concern for lot of people - as gathered from list,
SMP port for SPARC (LEON) processor is planned, someone at IIT kharagpur seems to be porting ecos to multicore architecture from TI, a port for multicore cradle architecture and exisiting in public cvs for ix86 smp port. it could be possible that for certain non-cvs mutlicore/multiprocessor architectures SMP ports are continuing, already exist but not mentioned or are being planned.


I was wondering if someone is porting SMP ecos to multicore architectures from IBM/Sony?? It could give a phenomenal boost to ecos, financial boost aside. I am just a programmer, my speculations could be wrong.

sandeep


-- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]