Architecture HAL Porting

A new architecture HAL is the most complex HAL to write, and it the least easily described. Hence this section is presently nothing more than a place holder for the future.

HAL Architecture Porting Process

The easiest way to make a new architecture HAL is simply to copy an existing architecture HAL of an, if possible, closely matching architecture and change all the files to match the new architecture. The MIPS architecture HAL should be used if possible, as it has the appropriate layout and coding conventions. Other HALs may deviate from that norm in various ways.

Note: eCos is written for GCC. It requires C and C++ compiler support as well as a few compiler features introduced during eCos development - so compilers older than eCos may not provide these features. Note that there is no C++ support for any 8 or 16 bit CPUs. Before you can undertake an eCos port, you need the required compiler support.

The following gives a rough outline of the steps needed to create a new architecture HAL. The exact order and set of steps needed will vary greatly from architecture to architecture, so a lot of flexibility is required. And of course, if the architecture HAL is to be tested, it is necessary to do variant and platform ports for the initial target simultaneously.

  1. Make a new directory for the new architecture under the hal directory in the source repository. Make an arch directory under this and populate this with the standard set of package directories.

  2. Copy the CDL file from an example HAL changing its name to match the new HAL. Edit the file, changing option names as appropriate. Delete any options that are specific to the original HAL, and and any new options that are necessary for the new architecture. This is likely to be a continuing process during the development of the HAL. See the Section called CDL Requirements for more details.

  3. Copy the hal_arch.h file from an example HAL. Within this file you need to change or define the following:

    • Define the HAL_SavedRegisters structure. This may need to reflect the save order of any group register save/restore instructions, the interrupt and exception save and restore formats, and the procedure calling conventions. It may also need to cater for optional FPUs and other functional units. It can be quite difficult to develop a layout that copes with all requirements.

    • Define the bit manipulation routines, HAL_LSBIT_INDEX() and HAL_MSBIT_INDEX(). If the architecture contains instructions to perform these, or related, operations, then these should be defined as inline assembler fragments. Otherwise make them calls to functions.

    • Define HAL_THREAD_INIT_CONTEXT(). This initializes a restorable CPU context onto a stack pointer so that a later call to HAL_THREAD_LOAD_CONTEXT() or HAL_THREAD_SWITCH_CONTEXT() will execute it correctly. This macro needs to take account of the same optional features of the architecture as the definition of HAL_SavedRegisters.

    • Define HAL_THREAD_LOAD_CONTEXT() and HAL_THREAD_SWITCH_CONTEXT(). These should just be calls to functions in context.S.

    • Define HAL_REORDER_BARRIER(). This prevents code being moved by the compiler and is necessary in some order-sensitive code. This macro is actually defined identically in all architecture, so it can just be copied.

    • Define breakpoint support. The macro HAL_BREAKPOINT(label) needs to be an inline assembly fragment that invokes a breakpoint. The breakpoint instruction should be labeled with the label argument. HAL_BREAKINST and HAL_BREAKINST_SIZE define the breakpoint instruction for debugging purposes.

    • Define GDB support. GDB views the registers of the target as a linear array, with each register having a well defined offset. This array may differ from the ordering defined in HAL_SavedRegisters. The macros HAL_GET_GDB_REGISTERS() and HAL_SET_GDB_REGISTERS() translate between the GDB array and the HAL_SavedRegisters structure. The HAL_THREAD_GET_SAVED_REGISTERS() translates a stack pointer saved by the context switch macros into a pointer to a HAL_SavedRegisters structure. Usually this is a one-to-one translation, but this macro allows it to differ if necessary.

    • Define long jump support. The type hal_jmp_buf and the functions hal_setjmp() and hal_longjmp() provide the underlying implementation of the C library setjmp() and longjmp().

    • Define idle thread action. Generally the macro HAL_IDLE_THREAD_ACTION() is defined to call a function in hal_misc.c.

    • Define stack sizes. The macros CYGNUM_HAL_STACK_SIZE_MINIMUM and CYGNUM_HAL_STACK_SIZE_TYPICAL should be defined to the minimum size for any thread stack and a reasonable default for most threads respectively. It is usually best to construct these out of component sizes for the CPU save state and procedure call stack usage. These definitions should not use anything other than numerical values since they can be used from assembly code in some HALs.

    • Define memory access macros. These macros provide translation between cached and uncached and physical memory spaces. They usually consist of masking out bits of the supplied address and ORing in alternative address bits.

    • Define global pointer save/restore macros. These really only need defining if the calling conventions of the architecture require a global pointer (as does the MIPS architecture), they may be empty otherwise. If it is necessary to define these, then take a look at the MIPS implementation for an example.

  4. Copy hal_intr.h from an example HAL. Within this file you should change or define the following:

    • Define the exception vectors. These should be detailed in the architecture specification. Essentially for each exception entry point defined by the architecture there should be an entry in the VSR table. The offsets of these VSR table entries should be defined here by CYGNUM_HAL_VECTOR_* definitions. The size of the VSR table also needs to be defined here.

    • Map any hardware exceptions to standard names. There is a group of exception vector name of the form CYGNUM_HAL_EXCEPTION_* that define a wide variety of possible exceptions that many architectures raise. Generic code detects whether the architecture can raise a given exception by testing whether a given CYGNUM_HAL_EXCEPTION_* definition is present. If it is present then its value is the vector that raises that exception. This does not need to be a one-to-one correspondence, and several CYGNUM_HAL_EXCEPTION_* definitions may have the same value.

      Interrupt vectors are usually defined in the variant or platform HALs. The interrupt number space may either be continuous with the VSR number space, where they share a vector table (as in the i386) or may be a separate space where a separate decode stage is used (as in MIPS or PowerPC).

    • Declare any static data used by the HAL to handle interrupts and exceptions. This is usually three vectors for interrupts: hal_interrupt_handlers[], hal_interrupt_data[] and hal_interrupt_objects[], which are sized according to the interrupt vector definitions. In addition a definition for the VSR table, hal_vsr_table[] should be made. These vectors are normally defined in either vectors.S or hal_misc.c.

    • Define interrupt enable/disable macros. These are normally inline assembly fragments to execute the instructions, or manipulate the CPU register, that contains the CPU interrupt enable bit.

    • A feature that many HALs support is the ability to execute DSRs on the interrupt stack. This is not an essential feature, and is better left unimplemented in the initial porting effort. If this is required, then the macro HAL_INTERRUPT_STACK_CALL_PENDING_DSRS() should be defined to call a function in vectors.S.

    • Define the interrupt and VSR attachment macros. If the same arrays as for other HALs have been used for VSR and interrupt vectors, then these macro can be copied across unchanged.

  5. A number of other header files also need to be filled in:

    • basetype.h. This file defines the basic types used by eCos, together with the endianness and some other characteristics. This file only really needs to contain definitions if the architecture differs significantly from the defaults defined in cyg_type.h

    • hal_io.h. This file contains macros for accessing device IO registers. If the architecture uses memory mapped IO, then these can be copied unchanged from an existing HAL such as MIPS. If the architecture uses special IO instructions, then these macros must be defined as inline assembler fragments. See the I386 HAL for an example. PCI bus access macros are usually defined in the variant or platform HALs.

    • hal_cache.h. This file contains cache access macros. If the architecture defines cache instructions, or control registers, then the access macros should be defined here. Otherwise they must be defined in the variant or platform HAL. Usually the cache dimensions (total size, line size, ways etc.) are defined in the variant HAL.

    • arch.inc and <architecture>.inc. These files are assembler headers used by vectors.S and context.S. <architecture>.inc is a general purpose header that should contain things like register aliases, ABI definitions and macros useful to general assembly code. If there are no such definitions, then this file need not be provided. arch.inc contains macros for performing various eCos related operations such as initializing the CPU, caches, FPU etc. The definitions here may often be configured or overridden by definitions in the variant or platform HALs. See the MIPS HAL for an example of this.

  6. Write vectors.S. This is the most important file in the HAL. It contains the CPU initialization code, exception and interrupt handlers. While other HALs should be consulted for structures and techniques, there is very little here that can be copied over without major edits.

    The main pieces of code that need to be defined here are:

    • Reset vector. This usually need to be positioned at the start of the ROM or FLASH, so should be in a linker section of its own. It can then be placed correctly by the linker script. Normally this code is little more than a jump to the label _start.

    • Exception vectors. These are the trampoline routines connected to the hardware exception entry points that vector through the VSR table. In many architectures these are adjacent to the reset vector, and should occupy the same linker section. If the architecture allow the vectors to be moved then it may be necessary for these trampolines to be position independent so they can be relocated at runtime.

      The trampolines should do the minimum necessary to transfer control from the hardware vector to the VSR pointed to by the matching table entry. Exactly how this is done depends on the architecture. Usually the trampoline needs to get some working registers by either saving them to CPU special registers (e.g. PowerPC SPRs), using reserved general registers (MIPS K0 and K1), using only memory based operations (IA32), or just jumping directly (ARM). The VSR table index to be used is either implicit in the entry point taken (PowerPC, IA32, ARM), or must be determined from a CPU register (MIPS).

    • Write kernel startup code. This is the location the reset vector jumps to, and can be in the main text section of the executable, rather than a special section. The code here should first initialize the CPU and other hardware subsystems. The best approach is to use a set of macro calls that are defined either in arch.inc or overridden in the variant or platform HALs. Other jobs that this code should do are: initialize stack pointer; copy the data section from ROM to RAM if necessary; zero the BSS; call variant and platform initializers; call cyg_hal_invoke_constructors(); call initialize_stub() if necessary. Finally it should call cyg_start(). See the Section called HAL Startup in Chapter 5 for details.

    • Write the default exception VSR. This VSR is installed in the VSR table for all synchronous exception vectors. See the Section called Default Synchronous Exception Handling in Chapter 5 for details of what this VSR does.

    • Write the default interrupt VSR. This is installed in all VSR table entries that correspond to external interrupts. See the Section called Default Synchronous Exception Handling in Chapter 5 for details of what this VSR does.

    • Write hal_interrupt_stack_call_pending_dsrs(). If this function is defined in hal_arch.h then it should appear here. The purpose of this function is to call DSRs on the interrupt stack rather than the current thread's stack. This is not an essential feature, and may be left until later. However it interacts with the stack switching that goes on in the interrupt VSR, so it may make sense to write these pieces of code at the same time to ensure consistency.

      When this function is implemented it should do the following:

      • Take a copy of the current SP and then switch to the interrupt stack.

      • Save the old SP, together with the CPU status register (or whatever register contains the interrupt enable status) and any other registers that may be corrupted by a function call (such as any link register) to locations in the interrupt stack.

      • Enable interrupts.

      • Call cyg_interrupt_call_pending_DSRs(). This is a kernel functions that actually calls any pending DSRs.

      • Retrieve saved registers from the interrupt stack and switch back to the current thread stack.

      • Merge the interrupt enable state recorded in the save CPU status register with the current value of the status register to restore the previous enable state. If the status register does not contain any other persistent state then this can be a simple restore of the register. However if the register contains other state bits that might have been changed by a DSR, then care must be taken not to disturb these.

    • Define any data items needed. Typically vectors.S may contain definitions for the VSR table, the interrupt tables and the interrupt stack. Sometimes these are only default definitions that may be overridden by the variant or platform HALs.

  7. Write context.S. This file contains the context switch code. See the Section called Thread Context Switching in Chapter 4 for details of how these functions operate. This file may also contain the implementation of hal_setjmp() and hal_longjmp().

  8. Write hal_misc.c. This file contains any C data and functions needed by the HAL. These might include:

    • hal_interrupt_*[]. In some HALs, if these arrays are not defined in vectors.S then they must be defined here.

    • cyg_hal_exception_handler(). This function is called from the exception VSR. It usually does extra decoding of the exception and invokes any special handlers for things like FPU traps, bus errors or memory exceptions. If there is nothing special to be done for an exception, then it either calls into the GDB stubs, by calling __handle_exception(), or invokes the kernel by calling cyg_hal_deliver_exception().

    • hal_arch_default_isr(). The hal_interrupt_handlers[] array is usually initialized with pointers to hal_default_isr(), which is defined in the common HAL. This function handles things like Ctrl-C processing, but if that is not relevant, then it will call hal_arch_default_isr(). Normally this function should just return zero.

    • cyg_hal_invoke_constructors(). This calls the constructors for all static objects before the program starts. eCos relies on these being called in the correct order for it to function correctly. The exact way in which constructors are handled may differ between architectures, although most use a simple table of function pointers between labels __CTOR_LIST__ and __CTOR_END__ which must called in order from the top down. Generally, this function can be copied directly from an existing architecture HAL.

    • Bit indexing functions. If the macros HAL_LSBIT_INDEX() and HAL_MSBIT_INDEX() are defined as function calls, then the functions should appear here. The main reason for doing this is that the architecture does not have support for bit indexing and these functions must provide the functionality by conventional means. While the trivial implementation is a simple for loop, it is expensive and non-deterministic. Better, constant time, implementations can be found in several HALs (MIPS for example).

    • hal_delay_us(). If the macro HAL_DELAY_US() is defined in hal_intr.h then it should be defined to call this function. While most of the time this function is called with very small values, occasionally (particularly in some ethernet drivers) it is called with values of several seconds. Hence the function should take care to avoid overflow in any calculations.

    • hal_idle_thread_action(). This function is called from the idle thread via the HAL_IDLE_THREAD_ACTION() macro, if so defined. While normally this function does nothing, during development this is often a good place to report various important system parameters on LCDs, LED or other displays. This function can also monitor system state and report any anomalies. If the architecture supports a halt instruction then this is a good place to put an inline assembly fragment to execute it. It is also a good place to handle any power saving activity.

  9. Create the <architecture>.ld file. While this file may need to be moved to the variant HAL in the future, it should initially be defined here, and only moved if necessary.

    This file defines a set of macros that are used by the platform .ldi files to generate linker scripts. Most GCC toolchains are very similar so the correct approach is to copy the file from an existing architecture and edit it. The main things that will need editing are the OUTPUT_FORMAT() directive and maybe the creation or allocation of extra sections to various macros. Running the target linker with just the --verbose argument will cause it to output its default linker script. This can be compared with the .ld file and appropriate edits made.

  10. If GDB stubs are to be supported in RedBoot or eCos, then support must be included for these. The most important of these are include/<architecture>-stub.h and src/<architecture>-stub.c. In all existing architecture HALs these files, and any support files they need, have been derived from files supplied in libgloss, as part of the GDB toolchain package. If this is a totally new architecture, this may not have been done, and they must be created from scratch.

    include/<architecture>-stub.h contains definitions that are used by the GDB stubs to describe the size, type, number and names of CPU registers. This information is usually found in the GDB support files for the architecture. It also contains prototypes for the functions exported by src/<architecture>-stub.c; however, since this is common to all architectures, it can be copied from some other HAL.

    src/<architecture>-stub.c implements the functions exported by the header. Most of this is fairly straight forward: the implementation in existing HALs should show exactly what needs to be done. The only complex part is the support for single-stepping. This is used a lot by GDB, so it cannot be avoided. If the architecture has support for a trace or single-step trap then that can be used for this purpose. If it does not then this must be simulated by planting a breakpoint in the next instruction. This can be quite involved since it requires some analysis of the current instruction plus the state of the CPU to determine where execution is going to go next.

CDL Requirements

The CDL needed for any particular architecture HAL depends to a large extent on the needs of that architecture. This includes issues such as support for different variants, use of FPUs, MMUs and caches. The exact split between the architecture, variant and platform HALs for various features is also somewhat fluid.

To give a rough idea about how the CDL for an architecture is structured, we will take as an example the I386 CDL.

This first section introduces the CDL package and placed it under the main HAL package. Include files from this package will be put in the include/cyg/hal directory, and definitions from this file will be placed in include/pkgconf/hal_i386.h. The compile line specifies the files in the src directory that are to be compiled as part of this package.

cdl_package CYGPKG_HAL_I386 {
    display       "i386 architecture"
    parent        CYGPKG_HAL
    hardware
    include_dir   cyg/hal
    define_header hal_i386.h
    description   "
        The i386 architecture HAL package provides generic
        support for this processor architecture. It is also
        necessary to select a specific target platform HAL
        package."

    compile       hal_misc.c context.S i386_stub.c hal_syscall.c

Next we need to generate some files using non-standard make rules. The first is vectors.S, which is not put into the library, but linked explicitly with all applications. The second is the generation of the target.ld file from i386.ld and the startup-selected .ldi file. Both of these are essentially boilerplate code that can be copied and edited.


    make {
        <PREFIX>/lib/vectors.o : <PACKAGE>/src/vectors.S
        $(CC) -Wp,-MD,vectors.tmp $(INCLUDE_PATH) $(CFLAGS) -c -o $@ $<
        @echo $@ ": \\" > $(notdir $@).deps
        @tail +2 vectors.tmp >> $(notdir $@).deps
        @echo >> $(notdir $@).deps
        @rm vectors.tmp
    }

    make {
        <PREFIX>/lib/target.ld: <PACKAGE>/src/i386.ld
        $(CC) -E -P -Wp,-MD,target.tmp -DEXTRAS=1 -xc $(INCLUDE_PATH) $(CFLAGS) -o $@ $<
        @echo $@ ": \\" > $(notdir $@).deps
        @tail +2 target.tmp >> $(notdir $@).deps
        @echo >> $(notdir $@).deps
        @rm target.tmp
    }

The i386 is currently the only architecture that supports SMP. The following CDL simply enabled the HAL SMP support if required. Generally this will get enabled as a result of a requires statement in the kernel. The requires statement here turns off lazy FPU switching in the FPU support code, since it is inconsistent with SMP operation.


    cdl_component CYGPKG_HAL_SMP_SUPPORT {
	display       "SMP support"
	default_value 0
	requires { CYGHWR_HAL_I386_FPU_SWITCH_LAZY == 0 }
	
	cdl_option CYGPKG_HAL_SMP_CPU_MAX {
	    display       "Max number of CPUs supported"
	    flavor        data
	    default_value 2
	}
    }

The i386 HAL has optional FPU support, which is enabled by default. It can be disabled to improve system performance. There are two FPU support options: either to save and restore the FPU state on every context switch, or to only switch the FPU state when necessary.

        
    cdl_component CYGHWR_HAL_I386_FPU {
	display       "Enable I386 FPU support"
	default_value 1
	description   "This component enables support for the
	              I386 floating point unit."

	cdl_option CYGHWR_HAL_I386_FPU_SWITCH_LAZY {
	    display       "Use lazy FPU state switching"
	    flavor        bool
	    default_value 1

	    description "
	                This option enables lazy FPU state switching.
                        The default behaviour for eCos is to save and
                        restore FPU state on every thread switch, interrupt
	                and exception. While simple and deterministic, this
	                approach can be expensive if the FPU is not used by
	                all threads. The alternative, enabled by this option,
	                is to use hardware features that allow the FPU state
	                of a thread to be left in the FPU after it has been
	                descheduled, and to allow the state to be switched to
	                a new thread only if it actually uses the FPU. Where
	                only one or two threads use the FPU this can avoid a
	                lot of unnecessary state switching."
	}
    }

The i386 HAL also has support for different classes of CPU. In particular, Pentium class CPUs have extra functional units, and some variants of GDB expect more registers to be reported. These options enable these features. Generally these are enabled by requires statements in variant or platform packages, or in .ecm files.


    cdl_component CYGHWR_HAL_I386_PENTIUM {
	display       "Enable Pentium class CPU features"
	default_value 0
	description   "This component enables support for various
	              features of Pentium class CPUs."

	cdl_option CYGHWR_HAL_I386_PENTIUM_SSE {
	    display       "Save/Restore SSE registers on context switch"
	    flavor        bool
	    default_value 0

	    description "
	                This option enables SSE state switching. The default
                        behaviour for eCos is to ignore the SSE registers.
                        Enabling this option adds SSE state information to
                        every thread context."
	}

	cdl_option CYGHWR_HAL_I386_PENTIUM_GDB_REGS {
	    display       "Support extra Pentium registers in GDB stub"
	    flavor        bool
	    default_value 0

	    description "
	                This option enables support for extra Pentium registers
			in the GDB stub. These are registers such as CR0-CR4, and
                        all MSRs. Not all GDBs support these registers, so the
                        default behaviour for eCos is to not include them in the
			GDB stub support code."
	}
    }

In the i386 HALs, the linker script is provided by the architecture HAL. In other HALs, for example MIPS, it is provided in the variant HAL. The following option provides the name of the linker script to other elements in the configuration system.

    cdl_option CYGBLD_LINKER_SCRIPT {
        display "Linker script"
        flavor data
	no_define
        calculated  { "src/i386.ld" }
    }

Finally, this interface indicates whether the platform supplied an implementation of the hal_i386_mem_real_region_top() function. If it does then it will contain a line of the form: implements CYGINT_HAL_I386_MEM_REAL_REGION_TOP. This allows packages such as RedBoot to detect the presence of this function so that they may call it.


    cdl_interface CYGINT_HAL_I386_MEM_REAL_REGION_TOP {
        display  "Implementations of hal_i386_mem_real_region_top()"
    }
    
}