This is trick (mapping RAM twice so you can use alternate Region
Protection Option addresses to control cacheability) is something any
Xtensa hardware designer might productively choose to do. And as it
works really well, we should encourage that by making this a generic
architecture feature for Zephyr.
Now everything works by setting two kconfig values at the soc level
defining the cached and uncached regions. As long as these are
correct, you can then use the new arch_xtensa_un/cached_ptr() APIs to
convert between them and a ARCH_XTENSA_SET_RPO_TLB() macro that
provides much smaller initialization code (in C!) than the HAL
assembly macros. The conversion routines have been generalized to
support conversion between any two regions.
Note that full KERNEL_COHERENCE still requires support from the
platform linker script, that can't be made generic given the way
Zephyr does linkage.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This file started using ALWAYS_INLINE from <toolchain.h> but didn't
include it. Transitive inclusions were hiding the problem most
places, but at least one test case exposes it.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
These are tiny functions always declared as "inline" per C99, but
that's just a hint. In practice, they tend to be (c.f. intel_asdp)
called from very early boot circumstances where main application
symbols aren't yet available. That obviously doesn't work, or even
link.
Make them ALWAYS_INLINE. In practice they're so small that we don't
want them called anyway just for stack space reasons.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
XCC doesn't like that a for loop which declares the loop
variable inline. So extract the declaration of the loop
variable outside the for loop.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The Xtensa L1 cache layer has straightforward semantics accessible via
single-instructions that operate on cache lines via physical
addresses. These are very amenable to inlining.
Unfortunately the Xtensa HAL layer requires function calls to do this,
leading to significant code waste at the calling site, an extra frame
on the stack and needless runtime instructions for situations where
the call is over a constant region that could elide the loop. This is
made even worse because the HAL library is not built with
-ffunction-sections, so pulling in even one of these tiny cache
functions has the effect of importing a 1500-byte object file into the
link!
Add our own tiny cache layer to include/arch/xtensa/cache.h and use
that instead.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>