One aspect of micro controller (MCU) development is access to hardware devices. This is most often done with memory mapped special function registers. So at fixed physical addresses, the program can write specific values to achieve a desirable hardware related effect. This practice has a long history and can also be found on larger computers as well. The difference being an operating system doing this and shielding the user from seeing it, using virtual address spaces and dedicated device drivers.

These registers are often described by documentation (the datasheet) describing the hardware and the accompanying registers, with the effect they have on hardware. The traditional way to program uses the C language and the chip manufacturer supplies header files with some usable description of the registers. Common formats are:

  • Whole bunch of preprocessor defines giving symbol names to specific registers.

  • Preprocesor defines with bitmasks to describe individual bits in the registers.

  • Later on, structs which are layed out to exacly map the registers of a particular peripherial device.

  • Small functional macros to perform specific functions such as disabling interrupts.

Most manufacturers employs a mix of these techniques.

For example the following is a snippet from one header file (the file is 9000 ish lines long)

  * @brief General Purpose I/O

typedef struct
  __IO uint32_t MODER;    /*!< GPIO port mode register,               Address offset: 0x00      */
  __IO uint32_t OTYPER;   /*!< GPIO port output type register,        Address offset: 0x04      */
  __IO uint32_t OSPEEDR;  /*!< GPIO port output speed register,       Address offset: 0x08      */
  __IO uint32_t PUPDR;    /*!< GPIO port pull-up/pull-down register,  Address offset: 0x0C      */
  __IO uint32_t IDR;      /*!< GPIO port input data register,         Address offset: 0x10      */
  __IO uint32_t ODR;      /*!< GPIO port output data register,        Address offset: 0x14      */
  __IO uint16_t BSRRL;    /*!< GPIO port bit set/reset low register,  Address offset: 0x18      */
  __IO uint16_t BSRRH;    /*!< GPIO port bit set/reset high register, Address offset: 0x1A      */
  __IO uint32_t LCKR;     /*!< GPIO port configuration lock register, Address offset: 0x1C      */
  __IO uint32_t AFR[2];   /*!< GPIO alternate function registers,     Address offset: 0x20-0x24 */
} GPIO_TypeDef;

Another section deals with the idividual bits:

/******************  Bits definition for GPIO_MODER register  *****************/
#define GPIO_MODER_MODER0                    ((uint32_t)0x00000003)
#define GPIO_MODER_MODER0_0                  ((uint32_t)0x00000001)
#define GPIO_MODER_MODER0_1                  ((uint32_t)0x00000002)

#define GPIO_MODER_MODER1                    ((uint32_t)0x0000000C)
#define GPIO_MODER_MODER1_0                  ((uint32_t)0x00000004)
#define GPIO_MODER_MODER1_1                  ((uint32_t)0x00000008)

#define GPIO_MODER_MODER2                    ((uint32_t)0x00000030)
#define GPIO_MODER_MODER2_0                  ((uint32_t)0x00000010)
#define GPIO_MODER_MODER2_1                  ((uint32_t)0x00000020)
// And continue on, and on,...

There are good reasons to use such a low level approach. Often, memory and cycles are at a premium. Using C and preprocessor defines, the overhead can be kept to a minimum for small programs. Since most MCUs are made in series from small to larger, the interface needs to be usable for the smallest chips.

Some issues with the defines.

These files are clearly machine generated so the information is most probably correct. So the information is good, but they are not very user-friendly for the developer.

The heavy reliance on the preprocessor have some drawbacks. Most notably the approch do not scale well. As complexity increases it get harder and harder to keep the code manageable. Pure textual replacement makes it hard to build abstractions. The preprocessor names is placed directly into the global namespace so this doesn’t scale when we want to support more hardware in the same codebase.

Also, using the preprocessor, we can not get any assistance from the compiler in keeping the code correct. All operations are done using integral values and raw pointers. We can’t really create types with restricted set of values where we put names on the values.

Often logically distinct fields (short ranges of bits) are packed into a single register. With a byte (or technically, char) being the least addressable unit there is a mismatch between the addressing model for C/C++ and addressing of logical units in the registers. This forces us to use bit manipulation and masking on whole register even though we only want to modify a few bits.

The need for volatile.

The register are memory mapped and the compiler do not have any way to distinguish the register from an ordinary memory location. Hence it expects any value written to remain there and that it is the only one accessing the location. This is used to optimize the program by shifting read/writes around, omitting writes etc.

Qualifying memory accesses with volatile indicate that the access will have visible external side effects. This constrain the compiler to do these read/writes as ordered by the program.

This is a low level detail that we really want to handle on a low level and avoid having to consider higher up in the program. So if a register is described by the type uint32_t, normally a pointer to it has the type uint32_t* By instead using the pointer type volatile uint32_t* we guarantee that the accesses are performed as intended when done through that pointer.

Do note that this forces every read and write. If we do consecutives `&=´ and `|=´ operations, they will not be merged but rather a read/modify/write for each operation is done. So normal expected optimizations must be done manually. This increases the need for field operations not only on the register itself but on a local variable of the right type.

Special access patterns for registers

Hardware registers can have more than the ordinary read/write access patterns. Some examples are:

clear on read

When the register is read, it is automatically cleared.

write one to clear

Each position written with a 1 is cleared.

trigger on write

The actual value written is irrelevant. The act of writing triggers the operation.

empty on read

The act of reading will flag the register as available for new data.

It is desirable to have specific function names to describe these patterns. Doing e.g. normal pointer dereference and assignment implies a normal read/write. We want more descriptive names for the more unusal accesses.

C++ advantages.

One of the main additions i C++ over C is the notion of user defined types. A type is some set of memory storage which can hold a set of distict values. Coupled to this is a well defined set of operations that are allowed on that type. Having a user defined type means that the designer of that type sets up the values and operations.

This would be a great improvement if we can pull off. For each group of bits with a common usage, we want to have a user defined type with the proper set of values and tailored operations for this group.

For example, assume we have a device with some GPIO pheriperial devices and each device controls 16 GPIO pins. It uses 32 bit registers. One register consist of 16 groups of 2 bits pairs where the 2 bits encode 3 values. It would be great if we could create e.g. an enum with 3 named values and had a systematic way to use these. Event better would be if the compiler could detect when we try to write the illegal 4:th value. Also, we want to read/write this without having to deal explicitly bit masking.

C++ offers templates. In some sense these fulfill the same need that the preprocessor is often used for in C. But they are part of the core language so the compiler have full avareness during the evaluation. Being a compile time feature, they genereate code but do not have any run time overhead once the code is generated. They are the key to building abstractions that do not introduce runtime overhead.

Finally namespaces allow us to create a hierarchy of names and keep a tidy organisation.

Some notes about classes.

The traditional tool for creating types is the class. It allows for private data and being able to restrict access to a defined set of member functions. However this will not work well here. We instantiate objects from these classes and they require a unique address. But we want to be able to address individual bits. Having objects in memory with run-time pointers and bit offsets will introduce an overhead that is simply to expensive. So this is a no-go.

Basic idea

In modern C++ there is an entire field called ‘template meta programming’ dedicated to doing calculations during comile time. This is a bit on the expert friendly side but it does serve to show the power of templates and the type system. In this field, types are created during compilation much in the same way that object are created at runtime. The end result are calculated types and constants to be used by the final program. We will not use this to any great extent. We mostly use the types to build a hierachy where there was once a flat namespace.

So we can easily create a lot of types. The idea is to create a unique type for each individually addressable bit field. Then using these types to select the appropiate operations that can be performed on this field. So by avoiding to use direct memory accesses and only doing accesses implemented for a particular field we get a much better correspondence between the offered API and actual hardware function.

One can think about this as building up an entire parallel structure of the register map but using types instead of traditionsl structs. Once we issue read/writes, we give the type names in the template arguments which will let the compiler know what operations can be done.

Goals for the implemented HAL layer

The goals for the HAL layers are:

  • Each logical bit field should be individually addressable in some way. This can range from an individual bit up to a full register. To keep complexity down, we do not support large bitfields spannning several registers.

  • It should be possible to keep naming convention from datasheets.

  • We want a meaningful representation of the bit field. Often this can be an enum of values giving a name to each bit pattern.

  • We want a useful names of the operations on the bit field which makes it clear what they do. (clear on read etc.)

  • Handle the `volatile´ issue.

  • Allow for unit testing by changing some type during tests to allow accesses to be directed to normal memory during unit testing.

  • It should feel natural to use the interface.

A nice bonus is also if we can aggregate a number of bit operations before actual writing them to the register. For a fallback we probably need traditional full register based access together with manual bitmask manipulation.

Some non-goals for the layer.

The basic service the layer introduce is to split up register access into distinct fields. We go from a number of same size registers and divide this into many more fields. However, the next step of recombination of fields into functionality is not part of this layer. That should be the responsiblity for the next layer. So this is really low level stuff.

Typical operations.

Example of operations we could use:

// Include names in current scope. Specifically the MCU family
// we are working on.
using namespace bsp;
using namespace bsp::stm32f4xx;

// Set the TE (transmit enable) bit to active in the control
// register CR on module usart1.
write<usart1, Usart::CR1::TE>(true);

// Read out the status bit 'transmit ready' to know if we can send a new byte.
volatile bool tr = read<usart1, Usart::SR::TR>();

// Set up clock rate calculation for the usart using cached
// writes to local variable val.
uint32_t val = 0;

// These 2 write to variable val in their respective field positions.
// They are the only fields so we get away with starting with a fixed zero.
cachedWrite<Usart::BRR::Mantissa>(val, 25);
cachedWrite<Usart::BRR::Fraction>(val, 7);

// And finally write it to register.
WriteReg<usart1, Usart::BRR>(val);

As expected when you do a write, we get the expected optimal operations of either bitset operations, or a read, set bit then write back to register. For the cached write, the compiler calculates the value at compile time as a constant and write it directly to the register. See the following assembly:

 800288a:       4a0a            ldr     r2, [pc, #40]   ; (80028b4 <_Z5setupv+0xb8>)
 8002894:       68d3            ldr     r3, [r2, #12]
 8002896:       f043 0308       orr.w   r3, r3, #8  ; Write Usart::CR1::TE
 800289a:       60d3            str     r3, [r2, #12]

 800289c:       6813            ldr     r3, [r2, #0]    ; read Usart::SR::TR.
 800289e:       f3c3 1340       ubfx    r3, r3, #5, #1 ; Extra intructions due to extra
 80028a2:       f88d 3007       strb.w  r3, [sp, #7]   ;  volatile in the test example.

 80028a6:       f240 1397       movw    r3, #407        ; 0x197 ; Calculated value from
 80028aa:       6093            str     r3, [r2, #8]    ; mantissa and fraction values.

 80028ac:       b002            add     sp, #8
 80028ae:       4770            bx      lr              ; return from function.
 80028b0:       40023800        ; Constants used by ARM thumbs to designate
 80028b4:       40011000        ; our raw addresses. 0x40011000 is for usart1.

A typical MCU

A typical mid range MCU consist of a computing core, some RAM and flash memory and a number of peripherials. The peripherials are hardware modules with a specific set of registers. The registers for an individual module are usually packed together in memory and can be modelled by a struct.

A normal range of module types are in the order of 10 - 40. Often a number of modules are duplicated so there are several instances of them. E.g. a GPIO module can control 16 GPIO pins and a chip can have 8 instances of the GPIO module. Each individual GPIO instance have its own base address. So it makes sense to model the module types and then use base pointers to distinguish between instances, much like classes and pointers to objects work. One additional twist is that we can have a type for each base pointer so that every operation can be calculated at compile time.

The issue with ARM Thumbs2 instruction set and hardware addresses.

The initial design used a static address for each individual register that was calculated at compile time. This gave a terrible assembly instruction flow. Turns out that the Thumbs instruction set is really optimized for C/C++ style of programming. It expects you to keep a base pointer to a class or struct and then access members relative to this pointer.

So for the fixed address, each individual access needed to fetch the memory location before access, and thereby bloating the program. But after rearranging the registers as members of a device struct, we got one loading of the base address (Usart* above) and then relative accesses to this pointer in a register (CPU register r2 above). These accesses used one instruction as expected.

This also tells us that the original organization with structs from the manufacturer using a struct takes this into account.

Type name layers.

So how do we want to organize the types and names in our program?

  • At the bottom we encapsulate all names in the namespace bsp. This keeps it separated from normal program logic.

  • Next level we probably want a namespace level for MCU families. Many MCU:s within a family share modules so it makes sense to keep them together. In our case I work with the STM32F429 chip. The manufacturer keeps most of the documentation common for the entire series of stm32f4xx chips. So I keep this level.

  • In parallel to the MCU families, we have common software constructs. For example, bitfield template classes, read/write template functions. The user visible functions (such as read/write) should be placed directly into the bsp namespace.

  • Inside the MCU family namespace we create a number of structs to model the modules. So for example we have struct Usart where we collect the data for the Usart module.

  • Inside the module struct, we create a struct for each register, named after the register.

  • Finally inside the register struct we have a number of using or typedef connecting the name of a field into a class template describing the bitfield.

In addition to this, some helper types are needed to connect the types.

Example structure.

So how could this look like?

namespace bsp {
namespace stm32f4xx {
struct Usart {
  struct SR {
    using Storage = uint32_t;
    static Storage& getReg(Usart& usart) { return; }

    using TE = BitField<SR, 0, 1, bool>;
    using RE = BitField<SR, 1, 1, bool>;
struct Gpio
  struct MODER
    using Storage = uint32_t;
    static Storage& getReg(Gpio& gpio) { return gpio.layout.moder; }
    enum Mode {
    using MODER_0 = BitField<MODER, Mode, 0, 2>;
    using MODER_1 = BitField<MODER, Mode, 2, 2>;
    // Allow access to fields with run time indexing.
    using MODER_I = bsp::BitFieldCollection<MODER, Mode, 2>;
static const std::size_t usart1 = 0x40011000;
static const std::size_t gpio1 = 0x40013800;
template<size_t addr, typename Field>
void write(typename Field::UserType val);

template<size_t addr, typename Field>
typename Field::UserType read();

The trick here lies in register using declarations that connect the field names into custom templates. So for TE we define a bitfield that is part of the SR register, is located at bit offset 0 and with width 1. We expect to convert the value to a bool. For multi bit fields we could use an enum instead, or even a custom class, possibly defined inside the register struct. This could be useful to perfom special calculations on the given values.

The connection between field names and register members.

Note that all our constructs so far has been static members, enums and using statements. None of these will add runtime size to the structure. To actually know the register we have the getRef() function that, given a module base pointer get a reference to the member register. In the current implementation I have an extra `struct { … } layout´ added at the end with all the registers (not shown here). All access functions uses this function to perform the actual reading and writing of registers.

Another prospect is to actually use the original struct from the manufacturer. It is very possible that you want to keep that around just to get access to a selection of their drivers. In this case, that struct is already tested and most likely correct. Now we just add the bitfield descriptions, get all the nice addressing of the bitfields but still have access to larger drivers from the manufacturers. (Do note that these tend to generate verbose object code due to the C nature of doing most work at runtime.) It will unfortunatey bring along the huge set of bit set macros and pollute the global namespace with these.

Regular pointer like base pointers.

It is still useful to operate on a generic module via a pointer rather than only addressing a specific instance. I.e. the following is useful:

// Set the TE (transmit enable) bit to active in the control register CR on module usart1.
write<Usart::CR1::TE>(usart1, true);  // Note usart1 given in normal argument list.

// Read out the status bit 'transmit ready' to know if we can send a new byte.
bool tr = read<Usart::SR::TR>(usart1);

This allows writing general drivers that will work for all instances of the module.

New opportunities with this.

Once this framework is in place, you suddenly have moved your MCU driver code into a namespace. So thinking a bit larger, assume you have several products, using different hardware, it is now possible to let the share a code base. There is no issue of having both a Cortex M3 based code mixed with a Cortex M4. To do this in the C world you really need to be careful with your includes and making sure to not mix the header files from several processors. Sharing the codebase allow you to keep the application logic in one place. Then having a few lines controlled by the build system to select what kind of hardware you are running on.

Then there are other developer friendly aspects. A properly set up IDE will have type suggestions when typing. Once you hit `Usart::´ the IDE can recognize this and suggest a list of first registers and then register fields. That should speed up the development process. Also having enum named values really improves code readability. You still need to know the datasheet, but you have descriptive names rather than numbers.

Auto generation of the code.

Do note that the register descriptions are very regular in their structure. So if the chip manufacturer have an external description of the register in e.g. an XML file, this code is very machine writable friendly. For e.g. ST there are open source XML files for at least the Cortex-M3 chips. Otherwise for a few modules it is very doable to hand code just the registers that is used. Making a list of these and translate from datasheet in a concentrated session is a fairly straight forward and mechanical.