This is the mail archive of the ecos-devel@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Subtle bug not calling DSRs.


Sergei Organov <osv@javad.com> writes:

> Hello,
> 
> I believe I've found bug when eCos doesn't call pended DSR(s) until
> _next_ time the scheduler lock is released. The scenario I see is like
> this:
> 
> 1. Start single thread that is run as soon as scheduler is enabled.
> 2. The thread suspends itself that way or another (wait on a primitive
>    or just sleep, -- it doesn't matter).
> 3. The (2) results in context switch from this thread to the idle thread
>    from within Cyg_Scheduler::unlock_inner(0).
> 4. During context switch interrupt happens that is requesting
>    corresponding DSR to be run. However, as scheduler is locked during
>    context switch, the DSR is not invoked on exit from the interrupt.
> 5. Context switch continues and transfers control to the idle thread
>    entry address being Cyg_HardwareThread::thread_entry(). This is
>    because the idle thread had no chance to run before this point.
> 6. Cyg_HardwareThread::thread_entry() has no code to check for pended
>    DSRs, so the DSR posted in (4) is not run.

Good catch.

This is a somewhat obscure bug since the number of applications that
start the scheduler and then do nothing at all is very small. However,
failing to handle DSRs across the context switch to start a thread is
clearly wrong.

> 
> Even if this common part can't be easily factored out, it is much better
> to at least keep two slightly different copies in one place close to
> each other I think.

Yep, that makes sense. The contents of Cyg_Thread::thread_entry() has
become more elaborate since it was originally written. A little
refactoring now won't go amiss.

> 
> Anyway, somebody more experienced in the eCos internal architecture
> (Nick?) may wish to take a look at this and could probably come up with
> a better patch.

I think the following patch fixes the bug and reorganizes things to be
a little tidier. If this proves to be the case I'll generate a
ChangeLog entry and check it in in a couple of days.


Index: kernel/current/include/sched.hxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/include/sched.hxx,v
retrieving revision 1.12
diff -u -5 -r1.12 sched.hxx
--- kernel/current/include/sched.hxx	23 May 2002 23:06:48 -0000	1.12
+++ kernel/current/include/sched.hxx	3 Jan 2006 10:23:36 -0000
@@ -174,10 +174,13 @@
     // decrement the lock but also look for a reschedule opportunity
     static void             unlock_reschedule();
 
     // release the preemption lock without rescheduling
     static void             unlock_simple();
+
+    // perform thread startup housekeeping
+    void Cyg_Scheduler::thread_entry( Cyg_Thread *thread );
     
     // Start execution of the scheduler
     static void start() CYGBLD_ATTRIB_NORET;
 
     // Start execution of the scheduler on the current CPU
Index: kernel/current/src/common/thread.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/src/common/thread.cxx,v
retrieving revision 1.20
diff -u -5 -r1.20 thread.cxx
--- kernel/current/src/common/thread.cxx	30 Jan 2003 07:02:53 -0000	1.20
+++ kernel/current/src/common/thread.cxx	3 Jan 2006 10:23:37 -0000
@@ -84,28 +84,14 @@
 void
 Cyg_HardwareThread::thread_entry( Cyg_Thread *thread )
 {
     CYG_REPORT_FUNCTION();
 
-    Cyg_Scheduler::scheduler.clear_need_reschedule(); // finished rescheduling
-    Cyg_Scheduler::scheduler.set_current_thread(thread); // restore current thread pointer
-
-    CYG_INSTRUMENT_THREAD(ENTER,thread,0);
-    
-#ifdef CYGSEM_KERNEL_SCHED_TIMESLICE
-    // Reset the timeslice counter so that this thread gets a full
-    // quantum. 
-    Cyg_Scheduler::reset_timeslice_count();
-#endif
+    // Call the scheduler to do any housekeeping
+    Cyg_Scheduler::scheduler.thread_entry( thread );
     
-    // Zero the lock
-    HAL_REORDER_BARRIER ();            // Prevent the compiler from moving
-    Cyg_Scheduler::zero_sched_lock();     // the assignment into the code above.
-    HAL_REORDER_BARRIER();
-
     // Call entry point in a loop.
-
     for(;;)
     {
         thread->entry_point(thread->entry_data);
         thread->exit();
     }
@@ -1223,15 +1209,15 @@
 
 #  endif // CYGNUM_KERNEL_THREADS_IDLE_STACK_SIZE < CYGNUM_HAL_STACK_SIZE_MINIMUM
 # endif // CYGNUM_KERNEL_THREADS_IDLE_STACK_SIZE
 #endif // CYGNUM_HAL_STACK_SIZE_MINIMUM
 
-static char idle_thread_stack[CYGNUM_KERNEL_CPU_MAX][CYGNUM_KERNEL_THREADS_IDLE_STACK_SIZE];
-
 // Loop counter for debugging/housekeeping
 cyg_uint32 idle_thread_loops[CYGNUM_KERNEL_CPU_MAX];
 
+static char idle_thread_stack[CYGNUM_KERNEL_CPU_MAX][CYGNUM_KERNEL_THREADS_IDLE_STACK_SIZE];
+
 // -------------------------------------------------------------------------
 // Idle thread code.
 
 void
 idle_thread_main( CYG_ADDRESS data )
Index: kernel/current/src/sched/sched.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/src/sched/sched.cxx,v
retrieving revision 1.16
diff -u -5 -r1.16 sched.cxx
--- kernel/current/src/sched/sched.cxx	9 Aug 2002 17:13:28 -0000	1.16
+++ kernel/current/src/sched/sched.cxx	3 Jan 2006 10:23:38 -0000
@@ -307,10 +307,32 @@
 
     CYG_FAIL( "Should not be executed" );
 }
 
 // -------------------------------------------------------------------------
+// Thread startup. This is called from Cyg_Thread::thread_entry() and
+// performs some housekeeping for a newly started thread.
+
+void Cyg_Scheduler::thread_entry( Cyg_Thread *thread )
+{
+    clear_need_reschedule();            // finished rescheduling
+    set_current_thread(thread);         // restore current thread pointer
+
+    CYG_INSTRUMENT_THREAD(ENTER,thread,0);
+    
+#ifdef CYGSEM_KERNEL_SCHED_TIMESLICE
+    // Reset the timeslice counter so that this thread gets a full
+    // quantum. 
+    reset_timeslice_count();
+#endif
+    
+    // Finally unlock the scheduler. As well as clearing the scheduler
+    // lock this allows any pending DSRs to execute.
+    unlock();    
+}
+
+// -------------------------------------------------------------------------
 // Start the scheduler. This is called after the initial threads have been
 // created to start scheduling. It gets any other CPUs running, and then
 // enters the scheduler.
 
 void Cyg_Scheduler::start()




-- 
Nick Garnett                                     eCos Kernel Architect
http://www.ecoscentric.com                The eCos and RedBoot experts



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]