The Xtensa implementation of arch_irq_offload() required that the user
select the correct interrupt manually, and would race with itself if
invoked from separate CPUs (it was saved here by the main
irq_offload() function which has a semaphore to serialize access).
Use the new gen_zsr.py script to automatically detect the highest
available software interrupt, and keep a per-CPU set of
callback/parameter pointers.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Some XCC toolchains do not provide atexit() which results
in undefined reference error. So add a weak dummy atexit()
for this siutation.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Turns out that xt-xcc will bail when faced with a real core-isa.h (it
wants you to rely on the builtins in the compiler). Undefine __XCC__
to force it to actually parse and emit declarations for its own
header.
(Also adds a newline to the generated one-line C file to silence a warning)
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
We had a similar sequence for interrupt return, where we were
selecting (actually only for the benefit of qemu) the highest priority
EPCn/EPSn registers for our RFI instruction. That works much better
in python the preprocessor.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The kernel coherence cache flush code was using a scratch register to
mark the top of the stack. Likewise a good candidate for ZSR use.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This is actually Cadence-authored code, but its use of EXCSAVE1 as a
sideband input to the exception handler is very much in the same
family of tricks. Use ZSR assignments here too.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Zephyr likes to use the various Xtensa scratch registers for its own
purposes in several places. Unfortunately, owing to the
configurability of the architecture, we have to use different
registers for different platforms. This has been done so far with a
collection of different tricks, some... less elegant than others.
Put it all in one place. This is a python script that emites a
"zsr.h" header with register assignments for all the existing users.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
For functions returning nothing, there is no need to document
with @return, as Doxgen complains about "documented empty
return type of ...".
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Startup on these devices was sort of a mess, with multiple variants of
Xtensa and platform initialization code from multiple ancestries being
invoked at different places for different purposes. Just use one code
path for everyone.
Bootloader entry starts with a minimal assembly stub that simply sets
WINDOW{START,BASE}, PS and a stack pointer and then jumps to C code.
That then uses the cpu_early_init() implementation from cAVS 2.5's
secondary cores to finish Xtensa initialization, and then flows
directly into the pre-existing bootloader C code to initialize cache
and memory and copy the HP-SRAM image, then it invokes Zephyr via a
simple C function call to z_cstart().
Likewise, remove the "reset vector" from Zephyr. This was never a
reset vector, reset on these devices goes to a fixed address in a ROM.
CPU initialization is handled explicitly and completely in the
bootloader now, in a way that can be unified between the main and
secondary cores. Entry from the bootloader now goes directly into
z_cstart() via a C call (via a single jump instruction placed at the
entry point address -- that's going away soon too once we're using a
unified link).
Now that vector table initialization happens in a uniform way, there's
no need to copy the VECBASE value during arch_start_cpu().
Finally note that this also reverts the
CONFIG_RESET_VECTOR_IN_BOOTLOADER kconfig variable added for these
platforms, because it's no longer a tunable and true always.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Adds Xtensa as supported architecture for coredump. Fixes
a few typos in documentation, Kconfig and a C file. Dumps
minimal set of registers shown by 'info registers' in GDB
for the sample_controller and ESP32 SOCs. Updates tests.
Signed-off-by: Lauren Murphy <lauren.murphy@intel.com>
This adds basic support for GDB stub on Xtensa. Note that
this only provides the common bits on the architecture side.
SoC support is also required to fully enable GDB stub on
each Xtensa SoC.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Call into z_thread_usage_stop() before ISR entry to avoid including
interrupt handling totals in thread usage stats.
Note that this hook is after the register save and stack swap has
happened, so it still incldues some overhead. But calling out from
the interrupted stack on Xtensa gets really, really hairy due to the
weird intermediate states we leverage (once we've saved enough context
to make a C call safely, we've lost the ability to use register
windows per the C ABI!).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This reverts commit 67d290540e.
The script is actually used to generate the _soc_inthandlers.h
file when introducing a new Xtensa SoC. So restore it.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Some Xtensa SoCs may not have that many levels of interrupts.
So limit the call to DEF_INT_C_HANDLER() to only supported
levels to avoid calling non-existent functions.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
A simple WAITI isn't sufficient in all cases. The cAVS 2.5 hardware
uses WAITI as the entry state for per-core power gating, which is very
difficult to debug. Provide a fallback that simply spins in the idle
loop waiting for interrupts to provide a stable system while this
feature stabilizes.
Also, the SOF code for those platforms references a known bug with the
Xtensa LX6 core IP (or at least some versions), and will prefix the
WAIT instruction with 128 NOP.N's followed by an ISYNC and EXTW. This
bug hasn't been seen under Zephyr yet, and details are sketchy. But
the code is simply enough to import and works correctly.
Place both workaround under new kconfig variables and select them both
(even though they're actually mutually exclusive -- if you select both
CPU_IDLE_SPIN overrides) for cavs_v25.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
On CPU startup, When we reach the cache flush code in arch_switch(),
the outgoing thread is a dummy. The behavior of the existing code was
to leave the existing value in the SR unchanged (probably NULL at
startup). Then the context switch would walk from that address up to
the top of the outgoing stack, flushing everything in between. That's
wrong, because the outgoing stack is a real pointer (generally the
interrupt stack of the current CPU), and we're flushing everything in
memory underneath it.
This also reverts commit 29abc8adc0 ("xtensa: fix booting secondary
cores on the dummy thread"), which appears to have been an early
attempt to address this issue. It worked (modulo all the extra and
potentially incorrect flushing) on cavs v1.5/1.8 because of the way
the entry code worked there. But on 2.5 we now hit the first context
switch in a case where those extra lines are in address space already
marked unwritable by the CPU, so the flush explodes.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Both operands of an operator in which the usual arithmetic
conversions are performed shall have the same essential
type category.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
When we reach this code in interrupt context, our upper GPRs contain a
cross-stack call that may still include some registers from the
interrupted thread. Those need to go out to memory before we can do
our cache coherence dance here.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Both new thread creation and context switch had the same mistake in
cache management: the bottom of the stack (the "unused" region between
the lower memory bound and the live stack pointer) needs to be
invalidated before we switch, because otherwise any dirty lines we
might have left over can get flushed out on top of the same thread on
another CPU that is putting live data there.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The Xtensa L1 cache layer has straightforward semantics accessible via
single-instructions that operate on cache lines via physical
addresses. These are very amenable to inlining.
Unfortunately the Xtensa HAL layer requires function calls to do this,
leading to significant code waste at the calling site, an extra frame
on the stack and needless runtime instructions for situations where
the call is over a constant region that could elide the loop. This is
made even worse because the HAL library is not built with
-ffunction-sections, so pulling in even one of these tiny cache
functions has the effect of importing a 1500-byte object file into the
link!
Add our own tiny cache layer to include/arch/xtensa/cache.h and use
that instead.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Back when I started work on this stuff, I had a set of notes on
register windows that slowly evolved into something that looks like
formal documentation. There really isn't any overview-style
documentation of this stuff on the public internet, so it couldn't
hurt to commit it here for posterity.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Instead of passing the crt1 _start function as the entry code for
auxiliary CPUs, use a tiny assembly stub instead which can avoid the
runtime testing needed to skip the work in _start. All the crt1 code
was doing was clearing BSS (which must not happen on a second CPU) and
setting the stack pointer (which is wrong on the second CPU).
This allows us to clean out the SMP code in crt1.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The kernel passes the CPU's interrupt stack expected that it will
start on that, so do it. Pass the initial stack pointer from the SOC
layer in the variable "z_mp_stack_top" and set it in the assembly
startup before calling z_mp_entry().
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The xtensa atomics layer was written with hand-coded assembly that had
to be called as functions. That's needlessly slow, given that the low
level primitives are a two-instruction sequence. Ideally the compiler
should see this as an inline to permit it to better optimize around
the needed barriers.
There was also a bug with the atomic_cas function, which had a loop
internally instead of returning the old value synchronously on a
failed swap. That's benign right now because our existing spin lock
does nothing but retry it in a tight loop anyway, but it's incorrect
per spec and would have caused a contention hang with more elaborate
algorithms (for example a spinlock with backoff semantics).
Remove the old implementation and replace with a much smaller inline C
one based on just two assembly primitives.
This patch also contains a little bit of refactoring to address the
scheme has been split out into a separate header for each, and the
ATOMIC_OPERATIONS_CUSTOM kconfig has been renamed to
ATOMIC_OPERATIONS_ARCH to better capture what it means.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
There was a bunch of dead historical cruft floating around in the
arch/xtensa tree, left over from older code versions. It's time to do
a cleanup pass. This is entirely refactoring and size optimization,
no behavior changes on any in-tree devices should be present.
Among the more notable changes:
+ xtensa_context.h offered an elaborate API to deal with a stack frame
and context layout that we no longer use.
+ xtensa_rtos.h was entirely dead code
+ xtensa_timer.h was a parallel abstraction layer implementing in the
architecture layer what we're already doing in our timer driver.
+ The architecture thread structs (_callee_saved and _thread_arch)
aren't used by current code, and had dead fields that were removed.
Unfortunately for standards compliance and C++ compatibility it's
not possible to leave an empty struct here, so they have a single
byte field.
+ xtensa_api.h was really just some interrupt management inlines used
by irq.h, so fold that code into the outer header.
+ Remove the stale assembly offsets. This architecture doesn't use
that facility.
All told, more than a thousand lines have been removed. Not bad.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
While fixing license headers, identified this script as orphan and not
being used anywhere, so remove.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
XCC doesn't like the "rsr.<reg name>" style assembly
so fix that to the other style.
Also, XCC doesn't like _CONCAT() with the EPC/EPS
registers so need to spell out all of them.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There is a hard-coded value of PS_INTLEVEL(15) to set the PS
register. The correct way is actually to use XCHAL_EXCM_LEVEL
with PS_INTLEVEL() to setup the register. So fix it.
Fixes#31858
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This change uses stack frame to print backtrace once exception occurs
Printing backtrace helps to identify the cause of exception
Signed-off-by: Shubham Kulkarni <shubham.kulkarni@espressif.com>
Currently Zephyr links reset-vector.S twice in xtensa builds:
into the bootloader and the main image. It is run at the end
of the boot loader execution and immediately after that again
in the beginning of the main code. This patch adds a
configuration option to select whether to link the file to the
bootloader or to the application. The default is to the
application, as needed e.g. for QEMU, SOF links it to the
bootloader like in native builds.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
There may be Xtensa SoCs which don't have high enough interrupt
levels for EPC6/EPS6 to exist in _restore_context. So changes
these to those which should be available according to the ISA
config file.
Fixes#30126
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Most of kernel files where declaring os module without providing
log level. Because of that default log level was used instead of
CONFIG_KERNEL_LOG_LEVEL.
Signed-off-by: Krzysztof Chruscinski <krzysztof.chruscinski@nordicsemi.no>
Since the tracing of thread being switched in/out has the same
instrumentation points, we can roll the tracing function calls
into the one for thread stats gathering functions.
This avoids duplicating code to call another function.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Adds the necessary bits to initialize TLS in the stack
area and sets up CPU registers during context switch.
Note that this does not enable TLS for all Xtensa SoC.
This is because Xtensa SoCs are highly configurable
so that each SoC can be considered a whole architecture.
So TLS needs to be enabled on the SoC level, instead of
at the arch level.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Implement the kernel "coherence" API on top of the linker
cached/uncached mapping work.
Add Xtensa handling for the stack coherence API.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
It's legal to have CONFIG_MP_NUM_CPUS > 1 and !CONFIG_SMP. The
tests/kernel/mp test does this as a unit test of the multiprocessor
facilities. Test the right tunable when deciding whether to blow away
static data or not.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This code had one purpose only, feed timing information into a test and
was not used by anything else. The custom trace points unfortunatly were
not accurate and this test was delivering informatin that conflicted
with other tests we have due to placement of such trace points in the
architecture and kernel code.
For such measurements we are planning to use the tracing functionality
in a special mode that would be used for metrics without polluting the
architecture and kernel code with additional tracing and timing code.
Furthermore, much of the assembly code used had issues.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Move tracing switched_in and switched_out to the architecture code and
remove duplications. This changes swap tracing for x86, xtensa.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
The core kernel computes the initial stack pointer
for a thread, properly aligning it and subtracting out
any random offsets or thread-local storage areas.
arch_new_thread() no longer needs to make any calculations,
an initial stack frame may be placed at the bounds of
the new 'stack_ptr' parameter passed in. This parameter
replaces 'stack_size'.
thread->stack_info is now set before arch_new_thread()
is invoked, z_new_thread_init() has been removed.
The values populated may need to be adjusted on arches
which carve-out MPU guard space from the actual stack
buffer.
thread->stack_info now has a new member 'delta' which
indicates any offset applied for TLS or random offset.
It's used so the calculations don't need to be repeated
if the thread later drops to user mode.
CONFIG_INIT_STACKS logic is now performed inside
z_setup_new_thread(), before arch_new_thread() is called.
thread->stack_info is now defined as the canonical
user-accessible area within the stack object, including
random offsets and TLS. It will never include any
carved-out memory for MPU guards and must be updated at
runtime if guards are removed.
Available stack space is now optimized. Some arches may
need to significantly round up the buffer size to account
for page-level granularity or MPU power-of-two requirements.
This space is now accounted for and used by virtue of
the Z_THREAD_STACK_SIZE_ADJUST() call in z_setup_new_thread.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
MISRA-C wants the parameter names in a function implementaion
to match the names used by the header prototype.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
arch_new_thread() passes along the thread priority and option
flags, but these are already initialized in thread->base and
can be accessed there if needed.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The core kernel z_setup_new_thread() calls into arch_new_thread(),
which calls back into the core kernel via z_new_thread_init().
Move everything that doesn't have to be in z_new_thread_init() to
z_setup_new_thread() and convert to an inline function.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Under multi-processing, only the first CPU#0 needs to go through
setting up the kernel structs and clearing out BSS (among others).
There is no need for other CPUs to do those tasks. Since each
Xtensa core starts using the same boot vector, CPUs other than #0
need to skip all the startup tasks by not calling to z_cstart().
So provide another entry point for those CPUs. Note that Xtensa
arch is highly configurable. So the implementation of the entry
point is up to each individual SoC config.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Under SMP, the main BSS section only needs to be zero-ed on CPU #0.
Other CPUs should not zero out BSS, or else it may cause CPU #0 to
crash on invalid data.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>