Home » blog » 2024

Fast ARMv6M mempy - Part 3 - Replace SDK memcpy

By Visenri
February 12, 2024

This is the 3rd part of "Fast ARMv6M mempy", see other parts you may have missed:

Full code available at github (test project, only fast ARMv6M mempy).

Replacing ROM memcpy

To use this faster implementation of "memcpy" without having to change each call point, a way to replace the ROM implementation is needed.

SDK memcpy wrapper

The RP2040 SDK uses a wrapper function to replace the C function "memcpy".

This is done using the "--wrap" gcc option:

--wrap=symbol

Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to "__wrap_symbol". Any undefined reference to "__real_symbol" will be resolved to symbol.
...
If you link other code with this file using --wrap malloc, then all calls to "malloc" will call the function "__wrap_malloc" instead. The call to "__real_malloc" in "__wrap_malloc" will call the real "malloc" function.

In the SDK multiple functions are replaced this way in: rp2_common/pico_mem_ops/CMakeLists.txt

...
pico_wrap_function(pico_mem_ops_pico memcpy)
...

Using the "pico_wrap_function" function defined in: CMakeLists.txt

function(pico_wrap_function TARGET FUNCNAME)
    target_link_options(${TARGET} INTERFACE "LINKER:--wrap=${FUNCNAME}")
endfunction()

So the real function being called at each "memcpy" call is defined in: mem_ops_aeabi.S

wrapper_func memcpy
    ldr r3, =aeabi_mem_funcs
    ldr r3, [r3, #MEMCPY]
    bx r3

It just branches to an address read from an array of function pointers initialized at startup ("aeabi_mem_funcs").

"wrapper_func" macro is defined in: asm_helper.S

.macro wrapper_func x
regular_func WRAPPER_FUNC_NAME(\x)
.endm

Depending on SDK version, WRAPPER_FUNC_NAME is defined in: platform.h or asm_helper.S

#define WRAPPER_FUNC_NAME(x) __wrap_##x

The array initialization is done in: mem_ops_aeabi.S

regular_func __aeabi_mem_init
    ldr r0, =aeabi_mem_funcs
    movs r1, #MEM_FUNC_COUNT
    ldr r3, =rom_funcs_lookup
    bx r3

It calls the function "rom_funcs_lookup" to fill the array of wrappers.

The net result is that there is an array called "aeabi_mem_funcs" that has been filled at startup with ROM function addresses.

Procedure to change the address of SDK function pointer

So, to "replace" the ROM function, the function pointer for ROM "memcpy" in "aeabi_mem_funcs" must be changed.
Luckily, the "aeabi_mem_funcs" symbol is global and can be used in "c".

But the SDK defines the offset for "memcpy" function pointer in the same file with an "equ":

.equ MEMCPY, 4

So, not accesible from c code.
An alternative is to compare the address of each one of the pointers of the array against the "memcpy" ROM function address.

To get that address, there is a function in the SDK to get the address of a ROM function using its "code":

void* rom_func_lookup (uint32_t code);

The "code" is a 2 byte code defined in: mem_ops_aeabi.S

The exact implementation details are different depending on the SDK version.

Old SDK:

aeabi_mem_funcs:
    .word rom_table_code('M','S')
    .word rom_table_code('M','C')
    .word rom_table_code('S','4')
    .word rom_table_code('C','4')

Macro "rom_table_code" from: asm_helper.S

#define rom_table_code(c1, c2) ((c1) | ((c2) << 8))

New SDK:

aeabi_mem_funcs:
    .word ROM_FUNC_MEMSET
    .word ROM_FUNC_MEMCPY
    .word ROM_FUNC_MEMSET4
    .word ROM_FUNC_MEMCPY44

Using macros from: bootrom.h

#define ROM_FUNC_MEMSET                 ROM_TABLE_CODE('M', 'S')
#define ROM_FUNC_MEMSET4                ROM_TABLE_CODE('S', '4')
#define ROM_FUNC_MEMCPY                 ROM_TABLE_CODE('M', 'C')
#define ROM_FUNC_MEMCPY44               ROM_TABLE_CODE('C', '4')
...

#define ROM_TABLE_CODE(c1, c2) ((c1) | ((c2) << 8))

So, to be compatible with new and old SDK versions, I decided to copy the macro from the new SDK and use it in my code.
This can be considered safe, the codes will not change in a future SDK, because they are in ROM.

Implementation

In a header named "memops_opt.h":

void * memcpy_armv6m(void *dst, const void *src, size_t length);
void memcpy_wrapper_replace(void * (*function)(void *, const void *, size_t));
void memcpy_wrapper_set_to_rom(void);

The new optimized asm function "memcpy_armv6m" is declared to be available for c code.
The function "memcpy_wrapper_replace" will be used to replace the pointer in the array.
The function "memcpy_wrapper_set_to_rom" can be used to restore the original rom function.

In the implementation, "memops_opt.c" defines the macros for the function "code":

#ifndef ROM_TABLE_CODE
    #define ROM_TABLE_CODE(c1, c2) ((c1) | ((c2) << 8))
#endif
#ifndef ROM_FUNC_MEMCPY
    #define ROM_FUNC_MEMCPY ROM_TABLE_CODE('M', 'C')
#endif

And declares the external array:

#define AEABI_MEM_FUNCS_COUNT 4
// Array of function pointers where memcpy pointer is stored by SDK:
extern void * aeabi_mem_funcs[AEABI_MEM_FUNCS_COUNT];

Finally, the "memcpy_wrapper_replace" gets the ROM function pointer and searches the array for a match to be replaced:

void memcpy_wrapper_replace(void * (*function)(void *, const void *, size_t))
{
    static int8_t posInFunctionArray = -1;

    if (function == NULL) // By default replace it by the optimized version.
        function = &memcpy_armv6m;

    if (posInFunctionArray < 0) // Array position is unknown, execute search.
    {
        void * fn = rom_func_lookup(ROM_FUNC_MEMCPY); // Get pointer to ROM memcpy
        for (int8_t i = 0; i < AEABI_MEM_FUNCS_COUNT; i++)
        {
            if (aeabi_mem_funcs[i] == fn)
            {
                aeabi_mem_funcs[i] = (void*) function;
                posInFunctionArray = i;
                break;
            }
        }
    }
    else // Array position is known, just replace the function pointer in the array.
        aeabi_mem_funcs[posInFunctionArray] = (void*) function;
}

When first called (and a match is found) it replaces the pointer with the provided function and stores the array position in a static variable for next calls.
If called with "NULL", it replaces the pointer with the default optimized function: "memcpy_armv6m".

For completeness, this is the implementation of the function to restore the original ROM function:

void memcpy_wrapper_set_to_rom(void)
{
    memcpy_wrapper_replace((void * (*)(void *, const void *, size_t))rom_func_lookup(ROM_FUNC_MEMCPY));
}
Previous post:
Fast ARMv6M mempy - Part 2 - Options & Results
Next post:
Fast ARMv6M mempy - Part 4 - Automated case generation
Comments (0) :
No comments yet, be the first !!
Leave your comment!
Your email address will not be published. Required fields marked with *.
Anti-spam question:
Write only letters: 8.8.5.c.g.k.m.5.z.6.c.0