Fast ARMv6M mempy - Part 3 - Replace SDK memcpy
This is the 3rd part of "Fast ARMv6M mempy", see other parts you may have missed:
- Part 1 - ASM code
- Part 2 - Options & Results
- Part 3 - Replace SDK memcpy
- Part 4 - Automated case generation
- Part 5 - Test function
- Part 6 - Benchmark function
Full code available at github (test project, only fast ARMv6M mempy).
Replacing ROM memcpy
To use this faster implementation of "memcpy" without having to change each call point, a way to replace the ROM implementation is needed.
SDK memcpy wrapper
The RP2040 SDK uses a wrapper function to replace the C function "memcpy".
This is done using the "--wrap" gcc option:
--wrap=symbol
Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to "__wrap_symbol". Any undefined reference to "__real_symbol" will be resolved to symbol.
...
If you link other code with this file using --wrap malloc, then all calls to "malloc" will call the function "__wrap_malloc" instead. The call to "__real_malloc" in "__wrap_malloc" will call the real "malloc" function.
In the SDK multiple functions are replaced this way in: rp2_common/pico_mem_ops/CMakeLists.txt
...
pico_wrap_function(pico_mem_ops_pico memcpy)
...
Using the "pico_wrap_function" function defined in: CMakeLists.txt
function(pico_wrap_function TARGET FUNCNAME)
target_link_options(${TARGET} INTERFACE "LINKER:--wrap=${FUNCNAME}")
endfunction()
So the real function being called at each "memcpy" call is defined in: mem_ops_aeabi.S
wrapper_func memcpy
ldr r3, =aeabi_mem_funcs
ldr r3, [r3, #MEMCPY]
bx r3
It just branches to an address read from an array of function pointers initialized at startup ("aeabi_mem_funcs").
"wrapper_func" macro is defined in: asm_helper.S
.macro wrapper_func x
regular_func WRAPPER_FUNC_NAME(\x)
.endm
Depending on SDK version, WRAPPER_FUNC_NAME is defined in: platform.h or asm_helper.S
#define WRAPPER_FUNC_NAME(x) __wrap_##x
The array initialization is done in: mem_ops_aeabi.S
regular_func __aeabi_mem_init
ldr r0, =aeabi_mem_funcs
movs r1, #MEM_FUNC_COUNT
ldr r3, =rom_funcs_lookup
bx r3
It calls the function "rom_funcs_lookup" to fill the array of wrappers.
The net result is that there is an array called "aeabi_mem_funcs" that has been filled at startup with ROM function addresses.
Procedure to change the address of SDK function pointer
So, to "replace" the ROM function, the function pointer for ROM "memcpy" in "aeabi_mem_funcs" must be changed.
Luckily, the "aeabi_mem_funcs" symbol is global and can be used in "c".
But the SDK defines the offset for "memcpy" function pointer in the same file with an "equ":
.equ MEMCPY, 4
So, not accesible from c code.
An alternative is to compare the address of each one of the pointers of the array against the "memcpy" ROM function address.
To get that address, there is a function in the SDK to get the address of a ROM function using its "code":
void* rom_func_lookup (uint32_t code);
The "code" is a 2 byte code defined in: mem_ops_aeabi.S
The exact implementation details are different depending on the SDK version.
Old SDK:
aeabi_mem_funcs:
.word rom_table_code('M','S')
.word rom_table_code('M','C')
.word rom_table_code('S','4')
.word rom_table_code('C','4')
Macro "rom_table_code" from: asm_helper.S
#define rom_table_code(c1, c2) ((c1) | ((c2) << 8))
New SDK:
aeabi_mem_funcs:
.word ROM_FUNC_MEMSET
.word ROM_FUNC_MEMCPY
.word ROM_FUNC_MEMSET4
.word ROM_FUNC_MEMCPY44
Using macros from: bootrom.h
#define ROM_FUNC_MEMSET ROM_TABLE_CODE('M', 'S')
#define ROM_FUNC_MEMSET4 ROM_TABLE_CODE('S', '4')
#define ROM_FUNC_MEMCPY ROM_TABLE_CODE('M', 'C')
#define ROM_FUNC_MEMCPY44 ROM_TABLE_CODE('C', '4')
...
#define ROM_TABLE_CODE(c1, c2) ((c1) | ((c2) << 8))
So, to be compatible with new and old SDK versions, I decided to copy the macro from the new SDK and use it in my code.
This can be considered safe, the codes will not change in a future SDK, because they are in ROM.
Implementation
In a header named "memops_opt.h":
void * memcpy_armv6m(void *dst, const void *src, size_t length);
void memcpy_wrapper_replace(void * (*function)(void *, const void *, size_t));
void memcpy_wrapper_set_to_rom(void);
The new optimized asm function "memcpy_armv6m" is declared to be available for c code.
The function "memcpy_wrapper_replace" will be used to replace the pointer in the array.
The function "memcpy_wrapper_set_to_rom" can be used to restore the original rom function.
In the implementation, "memops_opt.c" defines the macros for the function "code":
#ifndef ROM_TABLE_CODE
#define ROM_TABLE_CODE(c1, c2) ((c1) | ((c2) << 8))
#endif
#ifndef ROM_FUNC_MEMCPY
#define ROM_FUNC_MEMCPY ROM_TABLE_CODE('M', 'C')
#endif
And declares the external array:
#define AEABI_MEM_FUNCS_COUNT 4
// Array of function pointers where memcpy pointer is stored by SDK:
extern void * aeabi_mem_funcs[AEABI_MEM_FUNCS_COUNT];
Finally, the "memcpy_wrapper_replace" gets the ROM function pointer and searches the array for a match to be replaced:
void memcpy_wrapper_replace(void * (*function)(void *, const void *, size_t))
{
static int8_t posInFunctionArray = -1;
if (function == NULL) // By default replace it by the optimized version.
function = &memcpy_armv6m;
if (posInFunctionArray < 0) // Array position is unknown, execute search.
{
void * fn = rom_func_lookup(ROM_FUNC_MEMCPY); // Get pointer to ROM memcpy
for (int8_t i = 0; i < AEABI_MEM_FUNCS_COUNT; i++)
{
if (aeabi_mem_funcs[i] == fn)
{
aeabi_mem_funcs[i] = (void*) function;
posInFunctionArray = i;
break;
}
}
}
else // Array position is known, just replace the function pointer in the array.
aeabi_mem_funcs[posInFunctionArray] = (void*) function;
}
When first called (and a match is found) it replaces the pointer with the provided function and stores the array position in a static variable for next calls.
If called with "NULL", it replaces the pointer with the default optimized function: "memcpy_armv6m".
For completeness, this is the implementation of the function to restore the original ROM function:
void memcpy_wrapper_set_to_rom(void)
{
memcpy_wrapper_replace((void * (*)(void *, const void *, size_t))rom_func_lookup(ROM_FUNC_MEMCPY));
}