This is the mail archive of the
ecos-patches@sources.redhat.com
mailing list for the eCos project.
Bug in eCos kernel timers
- From: Thomas BINDER <Thomas dot Binder at frequentis dot com>
- To: ecos-discuss at sources dot redhat dot com, ecos-patches at sources dot redhat dot com
- Date: Wed, 25 Jun 2003 12:04:02 +0200
- Subject: Bug in eCos kernel timers
- Organization: Frequentis
Hi!
We found a bug in Cyg_Counter::rem_alarm in conjunction with CYGNUM_KERNEL_COUNTERS_MULTI_LIST that occasionally caused corruption of the kernel timer lists.
The problem occurs in the (very unlikely) event that an interruption between the selection of the list pointer and the actual remove operation happens. For example, if a timer with an interval is disabled in a thread and the tick function re-inserts the very same timer into another list pointer (as head). Other error conditions even without an interval are also possible. Generally, the problem occurs whenever a timer is used from several threads.
The bug caused our boards to hang occasionally after 3 days of operation. The application continously starts/stops 300 timers. To increase the bug probability we inserted a large number of NOPs between the selection of the pointer and the remove operation.
best regards,
Tom
--
--- packages/kernel/current/ChangeLog.old Wed Jun 25 11:32:26 2003
+++ packages/kernel/current/ChangeLog Wed Jun 25 11:30:24 2003
@@ -1,5 +1,11 @@
+2003-06-25 Thomas Binder <Thomas.Binder@frequentis.com>
+
+ * src/common/clock.cxx (Cyg_Counter::rem_alarm): Bugfix: call
+ Cyg_Scheduler::lock() before calculation of index into alarm_list
+ array to avoid race condition with multi list counters.
+
2003-06-06 David Brennan <eCos@brennanhome.com>
2003-06-23 Nick Garnett <nickg@balti.calivar.com>
* cdl/kernel.cdl: Added tests/bin_sem3 to list of kernel tests.
Index: packages/kernel/current/src/common/clock.cxx
===================================================================
RCS file: /project/cvsroot/vcs_3020_series/vds6000/software/os/ecos/ecos/packages/kernel/current/src/common/clock.cxx,v
retrieving revision 1.3
diff -a -U5 -r1.3 clock.cxx
--- packages/kernel/current/src/common/clock.cxx 2003/03/26 15:51:42 1.3
+++ packages/kernel/current/src/common/clock.cxx 2003/06/25 09:21:41
@@ -395,10 +395,12 @@
CYG_ASSERTCLASS( this, "Bad counter object" );
CYG_ASSERTCLASS( alarm, "Bad alarm passed" );
Cyg_Alarm_List *alarm_list_ptr; // pointer to list
+ Cyg_Scheduler::lock();
+
#if defined(CYGIMP_KERNEL_COUNTERS_SINGLE_LIST)
alarm_list_ptr = &alarm_list;
#elif defined(CYGIMP_KERNEL_COUNTERS_MULTI_LIST)
@@ -411,12 +413,10 @@
#error "No CYGIMP_KERNEL_COUNTERS_x_LIST config"
#endif
// Now that we have the list pointer, we can use common code for
// both list organizations.
-
- Cyg_Scheduler::lock();
CYG_INSTRUMENT_ALARM( REM, this, alarm );
alarm_list_ptr->remove( alarm );