From 00ccbce2c4ac741287484bca0b3aa84aceafaf56 Mon Sep 17 00:00:00 2001 From: Luca Burelli Date: Thu, 16 Jan 2025 12:48:12 +0100 Subject: [PATCH] doc: llext: add extension debugging guide Add a new section to the llext documentation that explains how to debug extensions and how to address the issues that may arise when doing so. Signed-off-by: Luca Burelli --- doc/services/llext/debug.rst | 359 +++++++++++++++++++++++++++++++++++ doc/services/llext/index.rst | 1 + doc/services/llext/load.rst | 8 +- 3 files changed, 361 insertions(+), 7 deletions(-) create mode 100644 doc/services/llext/debug.rst diff --git a/doc/services/llext/debug.rst b/doc/services/llext/debug.rst new file mode 100644 index 00000000000..2389c078dc9 --- /dev/null +++ b/doc/services/llext/debug.rst @@ -0,0 +1,359 @@ +.. _llext_debug: + +Debugging extensions +#################### + +Debugging extensions is a complex task. Since the extension code is by +definition not built with the Zephyr application, the final Zephyr ELF file +does not contain the symbols for extension code. Furthermore, the extension is +dynamically relocated by :c:func:`llext_load` at runtime, so even if the +symbols were available, it would be impossible for the debugger to know the +final locations of the symbols in the extension code. + +Setting up the debugger session properly in this case requires a few manual +steps. The following sections will provide some tips on how to do it with the +Zephyr SDK and the debug features provided by ``west``, but the instructions +can be adapted to any GDB-based debugging environment. + +Extension debugging process +=========================== + +1. Make sure the project is set up to display the verbose LLEXT debug output + (:kconfig:option:`CONFIG_LOG` and :kconfig:option:`CONFIG_LLEXT_LOG_LEVEL_DBG` + are set). + +2. Build the Zephyr application and the extensions. + + For each target ``name`` included in the current build, two files will be + generated into the ``llext`` subdirectory of the build root: + + ``name_ext_debug.elf`` + + An intermediate ELF file with full debugging information. + + ``name.llext`` + + The final extension binary, stripped to the essential data required for + loading into the Zephyr application. + + Other files may be present, depending on the target architecture and the + build configuration. + +3. Start a debugging session of the main Zephyr application. This is described + in the :ref:`Debugging ` section of the documentation; on + supported boards it is as easy as running ``west debug``, perhaps with some + additional arguments. + +4. Set a breakpoint just after the :c:func:`llext_load` function in your code + and let it run. This will load the extension into memory and relocate it. + The output logs will contain a line with ``gdb add-symbol-file flags:``, + followed by lines all starting with ``-s``. + +5. Type this command in the GDB console to load this extension's symbols: + + .. code-block:: + + add-symbol-file + + where ```` is the full path of the ELF file with debug + information identified in step 2, and ```` is a space + separated list of all the ``-s`` lines collected from the log in the + previous step. + +6. The extension symbols are now available to the debugger. You can set + breakpoints, inspect variables, and step through the code as usual. + +Steps 4-6 can be repeated for every extension that is loaded by the +application, if there are several. + +Symbol lookup issues +==================== + +.. warning:: + + It is almost certain that the loaded symbols will be shadowed by others in + the main application; for example, they may be located inside the memory + area of the ELF buffer or the LLEXT heap. + + In this case GDB chooses the first known symbol and therefore associates the + addresses to some ``elf_buffer+0x123`` instead of an expected ``ext_fn``. + This further confuses its high-level operations like source stepping or + inspecting locals, since they are meaningless in that context. + +Two possible solutions to this problem are discussed in the following +paragraphs. + +Discard all Zephyr symbols +-------------------------- + +The simplest option is to drop all the Zephyr application symbols from GDB by +invoking ``add-symbol-file`` with no arguments, before step 5. This will +however focus the debugging session to the llext only, as all information about +the Zephyr application will be lost. For example, the debugger may not be able to +properly follow stack traces outside the extension code. + +It is possible to use the same technique multiple times in the same session to +switch between the main and extension symbol tables as required, but it rapidly +becomes cumbersome. + +Edit the ELF file +----------------- + +This alternative is more complex but allows for a more seamless debugging +experience. The idea is to edit the main Zephyr ELF file to remove information +about the symbols that overlap with the extension that is to be debugged, so +that when the extension symbols are loaded, GDB will not have any ambiguity. +This can be done by using ``objcopy`` with the ``-N `` option. + +Identifying the offending symbols is however an iterative trial-and-error +procedure, as there can be many different layers; for example, the ELF buffer +may be itself contained in a symbol for the data segment. Fortunately, this +knowledge can then be used several times as the list is unlikely to change for +a given project. + +Example debugging session +========================= + +This example demonstrates how to debug the ``detached_fn`` extension in the +``tests/subsys/llext`` project (specifically, the ``writable`` case), on an +emulated ``mps2/an385`` board which is based on an ARM Cortex-M3. + +.. note:: + + The logs below have been obtained using Zephyr version 4.1 and the Zephyr + SDK version 0.17.0. However, the exact addresses may still vary between + runs even when using the same versions. Adjust the commands below to + match the results of your own session. + +The following command will build the project and start the emulator in +debugging mode: + +.. code-block:: + :caption: Terminal 1 (build, QEMU emulator, GDB server) + + zephyr$ west build -p -b mps2/an385 tests/subsys/llext/ -T llext.writable -t debugserver_qemu + -- west build: generating a build system + [...] + -- west build: running target debugserver_qemu + [...] + [186/187] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: cortex-m3 + +On a separate terminal, set ``ZEPHYR_SDK_INSTALL_DIR`` to the directory for the +Zephyr SDK on your installation, then start the GDB client for the target: + +.. code-block:: + :caption: Terminal 2 (GDB client) + + zephyr$ export LLEXT_SDK_INSTALL_DIR=/opt/zephyr-sdk-0.17.0 + zephyr$ ${LLEXT_SDK_INSTALL_DIR}/arm-zephyr-eabi/bin/arm-zephyr-eabi-gdb build/zephyr/zephyr.elf + GNU gdb (Zephyr SDK 0.17.0) 12.1 + [...] + Reading symbols from build/zephyr/zephyr.elf... + (gdb) + +Connect, set a breakpoint on the ``llext_load`` function and run until it +finishes: + +.. code-block:: + :caption: Terminal 2 (GDB client) + + (gdb) target extended-remote :1234 + Remote debugging using :1234 + z_arm_reset () at zephyr/arch/arm/core/cortex_m/reset.S:124 + 124 movs.n r0, #_EXC_IRQ_DEFAULT_PRIO + (gdb) break llext_load + Breakpoint 1 at 0x236c: file zephyr/subsys/llext/llext.c, line 168. + (gdb) continue + Continuing. + + Breakpoint 1, llext_load (ldr=ldr@entry=0x2000bef0 , + name=name@entry=0x9d98 "test_detached", + ext=ext@entry=0x2000abb8 , + ldr_parm=ldr_parm@entry=0x2000bee8 ) + at zephyr/subsys/llext/llext.c:168 + 168 *ext = llext_by_name(name); + (gdb) finish + Run till exit from #0 llext_load ([...]) + at zephyr/subsys/llext/llext.c:168 + llext_test_detached () at zephyr/tests/subsys/llext/src/test_llext.c:481 + 481 zassert_ok(res, "load should succeed"); + +The first terminal will have printed lots of debugging information related to +the extension loading. Find the section with the addresses: + +.. code-block:: + :caption: Terminal 1 (build, QEMU emulator, GDB server) + + [...] + D: Allocate and copy regions... + [...] + D: gdb add-symbol-file flags: + D: -s .text 0x20000034 + D: -s .data 0x200000b4 + D: -s .bss 0x2000c2e0 + D: -s .rodata 0x200000b8 + D: -s .detach 0x200001d0 + D: Counting exported symbols... + [...] + +Use these addresses to load the symbols into GDB: + +.. code-block:: + :caption: Terminal 2 (GDB client) + + (gdb) add-symbol-file build/llext/detached_fn_ext_debug.elf -s .text 0x20000034 -s .data 0x200000b4 -s .bss 0x2000c2e0 -s .rodata 0x200000b8 -s .detach 0x200001d0 + add symbol table from file "build/llext/detached_fn_ext_debug.elf" at + .text_addr = 0x20000034 + .data_addr = 0x200000b4 + .bss_addr = 0x2000c2e0 + .rodata_addr = 0x200000b8 + .detach_addr = 0x200001d0 + (y or n) y + Reading symbols from build/llext/detached_fn_ext_debug.elf... + (gdb) break detached_entry + Breakpoint 2 at 0x200001d0 (2 locations) + (gdb) continue + Continuing. + + Breakpoint 2, 0x200001d0 in test_detached_ext () + (gdb) backtrace + #0 0x200001d0 in test_detached_ext () + #1 0x200000ac in test_detached_ext () + #2 0x00000706 in llext_test_detached () at zephyr/tests/subsys/llext/src/test_llext.c:496 + #3 0x00001a36 in run_test_functions (suite=0x92bc , data=0x0 , test=0x92d8 ) at zephyr/subsys/testsuite/ztest/src/ztest.c:328 + #4 test_cb (a=0x92bc , b=0x92d8 , c=0x0 ) at zephyr/subsys/testsuite/ztest/src/ztest.c:662 + #5 0x00000e96 in z_thread_entry (entry=0x1a05 , p1=0x92bc , p2=0x92d8 , p3=0x0 ) at zephyr/lib/os/thread_entry.c:48 + #6 0x00000000 in ?? () + +The symbol associated with the breakpoint location and the last stack frames +mistakenly reference the ELF buffer in the Zephyr application instead of the +extension symbols. Note that GDB however knows both: + +.. code-block:: + :caption: Terminal 2 (GDB client) + + (gdb) info sym 0x200001d0 + test_detached_ext + 464 in section datas of zephyr/build/zephyr/zephyr.elf + detached_entry in section .detach of zephyr/build/llext/detached_fn_ext_debug.elf + (gdb) info sym 0x200000ac + test_detached_ext + 172 in section datas of zephyr/build/zephyr/zephyr.elf + test_entry + 8 in section .text of zephyr/build/llext/detached_fn_ext_debug.elf + +It is also impossible to inspect the variables in the extension or step through +code properly: + +.. code-block:: + :caption: Terminal 2 (GDB client) + + (gdb) print bss_cnt + No symbol "bss_cnt" in current context. + (gdb) print data_cnt + No symbol "data_cnt" in current context. + (gdb) next + Single stepping until exit from function test_detached_ext, + which has no line number information. + + Breakpoint 2, 0x200001ea in test_detached_ext () + (gdb) + +Discarding symbols +------------------ + +Discarding the Zephyr symbols and only focusing on the extension restores full +debugging functionality at the cost of losing the global context (note the +backtrace stops outside the extension): + +.. code-block:: + :caption: Terminal 2 (GDB client) + + (gdb) symbol-file + Discard symbol table from `zephyr/build/zephyr/zephyr.elf'? (y or n) y + Error in re-setting breakpoint 1: No symbol table is loaded. Use the "file" command. + No symbol file now. + (gdb) add-symbol-file build/llext/detached_fn_ext_debug.elf -s .text 0x20000034 -s .data 0x200000b4 -s .bss 0x2000c2e0 -s .rodata 0x200000b8 -s .detach 0x200001d0 + add symbol table from file "build/llext/detached_fn_ext_debug.elf" at + .text_addr = 0x20000034 + .data_addr = 0x200000b4 + .bss_addr = 0x2000c2e0 + .rodata_addr = 0x200000b8 + .detach_addr = 0x200001d0 + (y or n) y + Reading symbols from build/llext/detached_fn_ext_debug.elf... + (gdb) backtrace + #0 detached_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:18 + #1 0x200000ac in test_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:26 + #2 0x00000706 in ?? () + Backtrace stopped: previous frame identical to this frame (corrupt stack?) + (gdb) next + 19 zassert_true(data_cnt < 0); + (gdb) print bss_cnt + $1 = 1 + (gdb) print data_cnt + $2 = -2 + (gdb) + + +Editing the ELF file +-------------------- + +In this alternative approach, the patches to the Zephyr ELF file must be +performed after building the Zephyr binary and starting the emulator on +Terminal 1, but before starting the GDB client on Terminal 2. + +The above debugging session already identified ``test_detached_ext``, the char +array that holds the ELF file, as an offending symbol, so that will be removed +in a first pass. Performing the same steps multiple times, ``__data_start`` and +``__data_region_start`` can also be found to overlap the memory area of +interest. + +The following commands will remove all of these from the Zephyr ELF file, then +start a debugging session on the modified file: + +.. code-block:: + :caption: Terminal 2 (GDB client) + + zephyr$ export LLEXT_SDK_INSTALL_DIR=/opt/zephyr-sdk-0.17.0 + zephyr$ ${LLEXT_SDK_INSTALL_DIR}/arm-zephyr-eabi/bin/arm-zephyr-eabi-objcopy -N test_detached_ext -N __data_start -N __data_region_start build/zephyr/zephyr.elf build/zephyr/zephyr-edit.elf + zephyr$ ${LLEXT_SDK_INSTALL_DIR}/arm-zephyr-eabi/bin/arm-zephyr-eabi-gdb build/zephyr/zephyr-edit.elf + GNU gdb (Zephyr SDK 0.17.0) 12.1 + [...] + Reading symbols from build/zephyr/zephyr-edit.elf... + (gdb) + +The same steps used in the previous run can be performed again to attach to the +GDB server and load both the extension and its debug symbols. This time, however, +the result is rather different: + + * the ``break`` command includes line number information; + + * the output from ``backtrace`` contains functions from both the extension and + the Zephyr application; + + * the local variables can be properly inspected. + +.. code-block:: + :caption: Terminal 2 (GDB client) + + (gdb) add-symbol-file build/llext/detached_fn_ext_debug.elf [...] + [...] + Reading symbols from build/llext/detached_fn_ext_debug.elf... + (gdb) break detached_entry + Breakpoint 2 at 0x200001d6: file zephyr/tests/subsys/llext/src/detached_fn_ext.c, line 17. + (gdb) continue + Continuing. + + Breakpoint 2, detached_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:17 + 17 printk("bss %u @ %p\n", bss_cnt++, &bss_cnt); + (gdb) backtrace + #0 detached_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:17 + #1 0x200000ac in test_entry () at zephyr/tests/subsys/llext/src/detached_fn_ext.c:26 + #2 0x00000706 in llext_test_detached () at zephyr/tests/subsys/llext/src/test_llext.c:496 + #3 0x00001a36 in run_test_functions (suite=0x92bc , data=0x0 , test=0x92d8 ) at zephyr/subsys/testsuite/ztest/src/ztest.c:328 + #4 test_cb (a=0x92bc , b=0x92d8 , c=0x0 ) at zephyr/subsys/testsuite/ztest/src/ztest.c:662 + #5 0x00000e96 in z_thread_entry (entry=0x1a05 , p1=0x92bc , p2=0x92d8 , p3=0x0 ) at zephyr/lib/os/thread_entry.c:48 + #6 0x00000000 in ?? () + (gdb) print bss_cnt + $1 = 0 + (gdb) print data_cnt + $2 = -3 + (gdb) diff --git a/doc/services/llext/index.rst b/doc/services/llext/index.rst index ea63738ba3c..13bfe78316b 100644 --- a/doc/services/llext/index.rst +++ b/doc/services/llext/index.rst @@ -16,6 +16,7 @@ and introspected to some degree, as well as unloaded when no longer needed. config build load + debug api .. note:: diff --git a/doc/services/llext/load.rst b/doc/services/llext/load.rst index 78fc897b87d..a72913c0bde 100644 --- a/doc/services/llext/load.rst +++ b/doc/services/llext/load.rst @@ -94,13 +94,7 @@ If any of this happens, the following tips may help understand the issue: the issue. * Use a debugger to inspect the memory and registers to try to understand what - is happening. - - .. note:: - When using GDB, the ``add_symbol_file`` command may be used to load the - debugging information and symbols from the ELF file. Make sure to specify - the proper offset (usually the start of the ``.text`` section, reported - as ``region 0`` in the debug logs.) + is happening. See :ref:`Debugging extensions ` for more details. If the issue persists, please open an issue in the GitHub repository, including all the above information.