eCos contains support for limited Symmetric Multi-Processing (SMP). This is only available on selected architectures and platforms.
To allow a reasonable implementation of SMP, and to reduce the disruption to the existing source base, a number of assumptions have been made about the features of the target hardware.
Modest multiprocessing. The typical number of CPUs supported is two to four, with an upper limit around eight. While there are no inherent limits in the code, hardware and algorithmic limitations will probably become significant beyond this point.
SMP synchronization support. The hardware must supply a mechanism to allow software on two CPUs to synchronize. This is normally provided as part of the instruction set in the form of test-and-set, compare-and-swap or load-link/store-conditional instructions. An alternative approach is the provision of hardware semaphore registers which can be used to serialize implementations of these operations. Whatever hardware facilities are available, they are used in eCos to implement spinlocks.
Coherent caches. It is assumed that no extra effort will be required to access shared memory from any processor. This means that either there are no caches, they are shared by all processors, or are maintained in a coherent state by the hardware. It would be too disruptive to the eCos sources if every memory access had to be bracketed by cache load/flush operations. Any hardware that requires this is not supported.
Uniform addressing. It is assumed that all memory that is shared between CPUs is addressed at the same location from all CPUs. Like non-coherent caches, dealing with CPU-specific address translation is considered too disruptive to the eCos source base. This does not, however, preclude systems with non-uniform access costs for different CPUs.
Uniform device addressing. As with access to memory, it is assumed that all devices are equally accessible to all CPUs. Since device access is often made from thread contexts, it is not possible to restrict access to device control registers to certain CPUs, since there is currently no support for binding or migrating threads to CPUs.
Interrupt routing. The target hardware must have an interrupt controller that can route interrupts to specific CPUs. It is acceptable for all interrupts to be delivered to just one CPU, or for some interrupts to be bound to specific CPUs, or for some interrupts to be local to each CPU. At present dynamic routing, where a different CPU may be chosen each time an interrupt is delivered, is not supported. ECos cannot support hardware where all interrupts are delivered to all CPUs simultaneously with the expectation that software will resolve any conflicts.
Inter-CPU interrupts. A mechanism to allow one CPU to interrupt another is needed. This is necessary so that events on one CPU can cause rescheduling on other CPUs.
CPU Identifiers. Code running on a CPU must be able to determine which CPU it is running on. The CPU Id is usually provided either in a CPU status register, or in a register associated with the inter-CPU interrupt delivery subsystem. ECos expects CPU Ids to be small positive integers, although alternative representations, such as bitmaps, can be converted relatively easily. Complex mechanisms for getting the CPU Id cannot be supported. Getting the CPU Id must be a cheap operation, since it is done often, and in performance critical places such as interrupt handlers and the scheduler.
SMP support in any platform depends on the HAL supplying the appropriate operations. All HAL SMP support is defined in the cyg/hal/hal_smp.h header. Variant and platform specific definitions will be in cyg/hal/var_smp.h and cyg/hal/plf_smp.h respectively. These files are include automatically by this header, so need not be included explicitly.
SMP support falls into a number of functional groups.
This group consists of descriptive and control macros for managing the CPUs in an SMP system.
A type that can contain a CPU id. A CPU id is usually a small integer that is used to index arrays of variables that are managed on an per-CPU basis.
The maximum number of CPUs that can be supported. This is used to provide the size of any arrays that have an element per CPU.
Returns the number of CPUs currently operational. This may differ from HAL_SMP_CPU_MAX depending on the runtime environment.
Returns the CPU id of the current CPU.
A value that does not match any real CPU id. This is uses where a CPU type variable must be set to a null value.
Starts the given CPU executing at a defined
HAL entry point. After performing any HAL
level initialization, the CPU calls up into
the kernel at cyg_kernel_cpu_startup()
.
Sends the CPU a reschedule interrupt, and if
wait
is non-zero, waits for an
acknowledgment. The interrupted CPU should call
cyg_scheduler_set_need_reschedule()
in its DSR to
cause the reschedule to occur.
Sends the CPU a timeslice interrupt, and if
wait
is non-zero, waits for an
acknowledgment. The interrupted CPU should call
cyg_scheduler_timeslice_cpu()
to cause the
timeslice event to be processed.
Test-and-set is the foundation of the SMP synchronization mechanisms.
The type for all test-and-set variables. The test-and-set macros only support operations on a single bit (usually the least significant bit) of this location. This allows for maximum flexibility in the implementation.
Performs a test and set operation on the
location tas
. oldb
will contain true if
the location was already set, and false if
it was clear.
Performs a test and clear operation on the
location tas
. oldb
will contain true if
the location was already set, and false if
it was clear.
Spinlocks provide inter-CPU locking. Normally they will be implemented on top of the test-and-set mechanism above, but may also be implemented by other means if, for example, the hardware has more direct support for spinlocks.
The type for all spinlock variables.
A value that may be assigned to a spinlock variable to initialize it to clear.
A value that may be assigned to a spinlock variable to initialize it to set.
The caller spins in a busy loop waiting for the lock to become clear. It then sets it and continues. This is all handled atomically, so that there are no race conditions between CPUs.
The caller clears the lock. One of any waiting spinners will then be able to proceed.
Attempts to set the lock. The value put in
val
will be true if the lock was
claimed successfully, and false if it was
not.
Tests the current value of the lock. The value
put in val
will be true if the lock is
claimed and false of it is clear.
The scheduler lock is the main protection for all kernel data structures. By default the kernel implements the scheduler lock itself using a spinlock. However, if spinlocks cannot be supported by the hardware, or there is a more efficient implementation available, the HAL may provide macros to implement the scheduler lock.
A data type, possibly a structure, that contains any data items needed by the scheduler lock implementation. A variable of this type will be instantiated as a static member of the Cyg_Scheduler_SchedLock class and passed to all the following macros.
Initialize the scheduler lock. The lock
argument is the scheduler lock counter and the
data
argument is a variable of
HAL_SMP_SCHEDLOCK_DATA_TYPE type.
Increment the scheduler lock. The first increment of the lock from zero to one for any CPU may cause it to wait until the lock is zeroed by another CPU. Subsequent increments should be less expensive since this CPU already holds the lock.
Zero the scheduler lock. This operation will also clear the lock so that other CPUs may claim it.
Set the lock to a different value, in
new
. This is only called when the lock is
already known to be owned by the current CPU. It is never called to
zero the lock, or to increment it from zero.
The routing of interrupts to different CPUs is supported by two new interfaces in hal_intr.h.
Once an interrupt has been routed to a new CPU, the existing vector masking and configuration operations should take account of the CPU routing. For example, if the operation is not invoked on the destination CPU itself, then the HAL may need to arrange to transfer the operation to the destination CPU for correct application.
Route the interrupt for the given vector
to
the given cpu
.
Set cpu
to the id of the CPU to which this
vector is routed.