Nicolas Pitre points out that since these thread structs are just
dummies for the context swtiching, they can be presumed to be "write
only" and thus there's no point in having one per CPU, everyone can
share the same one.
The only gotcha is that we never really documented (nor really have a
place to document) that rule, so it's not theoretically impossible for
an architecture to read back what it might have written underneath
arch_switch(). Leave this in a separate commit for bisection
purposes, but the risk seems very low.
Signed-off-by: Andy Ross <andyross@google.com>
After a k_thread_abort(), the resulting thread struct is documented as
unused/free memory that may be re-used (for example, to respawn a new
thread).
But in the special case of aborting the current thread from within an
ISR, that wasn't quite happening. The scheduler cleanup would
complete, but the architecture layer would still try to context switch
away from the aborted thread on exit, and that can include writes to
the now-reused thread struct! The specifics will depend on
architecture (some do a full context save on entry, most don't), but
in the case of USE_SWITCH=y it will at the very least write the
switch_handle field.
Fix this simply, with a per-cpu "switch dummy" thread struct for use
as a target for context switches like this. There is some non-trivial
memory cost to that; thread structs on many architectures are large.
Pleasingly, this also addresses a known deadlock on SMP: because the
"spin in ISR" step now happens as the very last stage of
k_thread_abort() handling, the existing scheduler lock works to
serialize calls such that it's impossible for a cycle of threads to
independently decide to spin on each other: at least one will see
itself as "already aborting" and break the cycle.
Fixes#64646
Signed-off-by: Andy Ross <andyross@google.com>
This reverts commit 61c70626a5.
This PR introduced 2 regressions in main CI:
71977 & 71978
Let's revert it by now to get main's CI passing again.
Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 20611f13ca.
This PR introduced 2 regressions in main CI:
71977 & 71978
Let's revert it by now to get main's CI passing again.
Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
Nicolas Pitre points out that since these thread structs are just
dummies for the context swtiching, they can be presumed to be "write
only" and thus there's no point in having one per CPU, everyone can
share the same one.
The only gotcha is that we never really documented (nor really have a
place to document) that rule, so it's not theoretically impossible for
an architecture to read back what it might have written underneath
arch_switch(). Leave this in a separate commit for bisection
purposes, but the risk seems very low.
Signed-off-by: Andy Ross <andyross@google.com>
After a k_thread_abort(), the resulting thread struct is documented as
unused/free memory that may be re-used (for example, to respawn a new
thread).
But in the special case of aborting the current thread from within an
ISR, that wasn't quite happening. The scheduler cleanup would
complete, but the architecture layer would still try to context switch
away from the aborted thread on exit, and that can include writes to
the now-reused thread struct! The specifics will depend on
architecture (some do a full context save on entry, most don't), but
in the case of USE_SWITCH=y it will at the very least write the
switch_handle field.
Fix this simply, with a per-cpu "switch dummy" thread struct for use
as a target for context switches like this. There is some non-trivial
memory cost to that; thread structs on many architectures are large.
Pleasingly, this also addresses a known deadlock on SMP: because the
"spin in ISR" step now happens as the very last stage of
k_thread_abort() handling, the existing scheduler lock works to
serialize calls such that it's impossible for a cycle of threads to
independently decide to spin on each other: at least one will see
itself as "already aborting" and break the cycle.
Fixes#64646
Signed-off-by: Andy Ross <andyross@google.com>
After the move to C files we got some drop in the performance when
running latency_measure. This patch declares the priority queue
functions as static inlines with minor optimizations.
The result for one metric (on qemu):
3.6 and before the anything was changed:
Get data from LIFO (w/ ctx switch): 13087 ns
after original change (46484da502):
Get data from LIFO (w/ ctx switch): 13663 ns
with this change:
Get data from LIFO (w/ ctx switch): 12543 ns
So overall, a net gain of ~ 500ns that can be seen across the board on many
of the metrics.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
This adds the mechanism to do cleanup after k_thread_abort()
is called with the current thread. This is mainly used for
cleaning up things when the thread cannot be running, e.g.,
cleanup the thread stack.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Do not build threading support when CONFIG_MULTITHREADING=n is set and
move needed calls to a new file with the changes needed instead of the
ifdef party in sched.c
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Add a closing comment to the endif with the configuration
information to which the endif belongs too.
To make the code more clearer if the configs need adaptions.
Signed-off-by: Simon Hein <Shein@baumer.com>
arch_interface.h is for architecture and should not be
under sys/. So move it under include/zephyr/arch/.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Modified bitmask to bitmask array, it can make multilevel queue remove
32 bit prioriry limit.
We can scan bitmask array to find which queue have ready thread.
Only need the number of queues as priority because the priority
is checked on create_thread.
Signed-off-by: TaiJu Wu <tjwu1217@gmail.com>
Rename private function to make it clear what priority we are setting
and to be consistent across the code.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Move thread monitor related functions, not enabled in most cases outside
of thread.c and cleanup headers.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
This function is only being used by a test, so instead of reimplementing
a syscall in the test, provide a Kconfig option to provide the
functionality that only works with tests and remove some of the
duplication and extra code.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Move out of thread and put directly in init.c where it is being used.
Also remove definition from kernel.h, this is an internal function and
should not be in a public header.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
The functions to manipulate the essential flag indeed operate on
threads, but they are misplaced in the thread implementation file. Put
them alongside other routines setting other thread flags and cleanup
headers a bit.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
clean up headers under include/ and move handling of priority queue to
own file/header.
No need for the header include/zephyr/kernel/internal/sched_priq.h
anymore. Move the relevant structures where they are being used in
kernel_structs.h.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Instead of rounding up both __tdata_size and __tbss_size at runtime,
perform the calculation when the image is built.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
In many cases suspending or resuming of a device is limited to
just a few register writes. Current solution assumes that those
operations may be blocking, asynchronous and take a lot of time.
Due to this assumption runtime PM API cannot be effectively used
from the interrupt context. Zephyr has few driver APIs which
can be used from an interrupt context and now use of runtime PM
is limited in those cases.
Patch introduces a new type of PM device - synchronous PM. If
device is specified as capable of synchronous PM operations then
device runtime getting and putting is executed in the critical
section. In that case, runtime API can be used from an interrupt
context. Additionally, this approach reduces RAM needed for
PM device (104 -> 20 bytes of RAM on ARM Cortex-M).
Signed-off-by: Krzysztof Chruściński <krzysztof.chruscinski@nordicsemi.no>
It is possible that address + size will overflow the available
address space and the pointer wraps around back to zero. Some
of these have been fixed in previous commits. This fixes
the remaining ones with regard to Z_PHYS_RAM_START/_END,
and Z_VIRT_RAM_START/_END.
Fixes#65542
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There are several subsystems and boards which require a relatively large
system heap (used by k_malloc()) to function properly. This became even
more notable with the recent introduction of the ACPICA library, which
causes ACPI-using boards to require a system heap of up to several
megabytes in size.
Until now, subsystems and boards have tried to solve this by having
Kconfig overlays which modify the default value of HEAP_MEM_POOL_SIZE.
This works ok, except when applications start explicitly setting values
in their prj.conf files:
$ git grep CONFIG_HEAP_MEM_POOL_SIZE= tests samples|wc -l
157
The vast majority of values set by current sample or test applications
is much too small for subsystems like ACPI, which results in the
application not being able to run on such boards.
To solve this situation, we introduce support for subsystems to specify
their own custom system heap size requirement. Subsystems do
this by defining Kconfig options with the prefix HEAP_MEM_POOL_ADD_SIZE_.
The final value of the system heap is the sum of the custom
minimum requirements, or the value existing HEAP_MEM_POOL_SIZE option,
whichever is greater.
We also introduce a new HEAP_MEM_POOL_IGNORE_MIN Kconfig option which
applications can use to force a lower value than what subsystems have
specficied, however this behavior is disabled by default.
Whenever the minimum is greater than the requested value a CMake warning
will be issued in the build output.
This patch ends up modifying several places outside of kernel code,
since the presence of the system heap is no longer detected using a
non-zero CONFIG_HEAP_MEM_POOL_SIZE value, rather it's now detected using
a new K_HEAP_MEM_POOL_SIZE value that's evaluated at build.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
z_free_page_count is only used in one file, so there is
no need to expose it, even to other part of kernel.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The function _Cstart has already been renamed to z_cstart,
so change the remaining references of it in various docs.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This moves the k_* memory management functions from sys/ into
kernel/ includes, as there are kernel public APIs. The z_*
functions are further separated into the kernel internal
header directory.
Also made a quick change to doxygen to group sys_mem_* into
the OS Memory Management group so they will appear in doc.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Rename z_early_boot_rand_get with z_early_rand_get to get consistent
with other early functions.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
The wording on deprecating arch_kernel_init() in favor of prep_c()
has never been materialized. Various architectures are using it to
perform initialization. So remove the wording.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Integrates object core statistics framework into the following
kernel objects:
sys_mem_blocks, k_mem_slab
threads, _cpu, z_kernel
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
This internal kernel API is misplaced in a public kernel header. Just
make it available to the code using it in the kernel.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
The _EXPIRED macro is no longer necessary. It is a relic of an older
timeout processing algorithm from several years ago.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
This is a private kernel header with private kernel APIs, it should not
be exposed in the public zephyr include directory.
Once sample remains to be fixed (metairq_dispatch), which currently uses
private APIs from that header, it should not be the case.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
This header does not expose any public APIs, so move it under
kernel/include and change files including it.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
While the LOCKED pattern is universally useful it can be misused. This
change therefore exposes the LOCKED pattern with extensive usage
documentation to reduce the risk of abuse or unintended deadlock.
Signed-off-by: Florian Grandel <fgrandel@code-for-humans.de>
Device dependencies are not always required, so make them optional via
CONFIG_DEVICE_DEPS. When enabled, the gen_device_deps script will run so
that dependencies are collected and part of the final image. Related
APIs will be also made available. Since device dependencies are used in
just a few places (power domains), disable the feature by default. When
not enabled, a second linking pass will not be required.
Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
Rename struct device `handles` member to `deps`, in line with previous
renamings in the device API.
Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
The switch_handle field in the thread struct is used as an atomic flag
between CPUs in SMP, and has been known for a long time to technically
require memory barriers for correct operation. We have an API for
that now, so put them in:
* The code immediately before arch_switch() needs a write barrier to
ensure that thread state written by the scheduler is seen to happen
before the outgoing thread is flagged with a valid switch handle.
* The loop in z_sched_switch_spin() needs a read barrier at the end,
to make sure the calling context doesn't load state from before the
other CPU stored the switch handle.
Also, that same spot in switch_spin was spinning with interrupts held,
which means it needs a call to arch_spin_relax() to avoid a FPU state
deadlock on some architectures.
Signed-off-by: Andy Ross <andyross@google.com>
This trick turns out also to be needed by the abort/join code.
Promote it to a more formal-looking internal API and clean up the
documentation to (hopefully) clarify the exact behavior and better
explain the need.
This is one of the more... enchanted bits of the scheduler, and while
the trick is IMHO pretty clean, it remains a big SMP footgun.
Signed-off-by: Andy Ross <andyross@google.com>
z_page_frame can't be packed on Xtensa due memory alignment
constraints. When this is struct is packed it is 5 bytes long it will
cause an memory alignment problem on Xtensa.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Removes unused absolute symbols that are defined via the
GEN_ABSOLUTE_SYM() macro in the kernel directory.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Rework the fragile and ad-hoc computation of timeslice expirations
into per-CPU struct _timeout objects with regular callbacks. The
expiration callbacks themselves simply set a per-cpu flag (they might
run on any CPU), which gets checked at the end of the timer ISR on
every CPU.
This simplifies logic and removes a bunch of code. It also fixes at
least three bugs:
1. As @npitre discovered: On SMP, the number of ticks announced on any
given CPU is going to be a subset of all expired ticks. This broke
the accounting of timeslice ticks, and effectively meant that
timeslicing only worked on SMP on systems where one CPU could hog all
the announcements, and only on that CPU.
2. The bootstrap path to arm the timer driver after setting the first
timeout in an empty list couldn't take into account
sys_clock_elapsed() ticks, as it didn't know whether it was being
called underneath an existing announce loop. Now this code is no
longer responsible for knowing anything about time slicing at all.
3. Also on SMP, there was a case where two CPUs timeslicing
simultaneously could stomp on each others' timeouts in
z_set_timeout_expiry(), as neither had a way of knowing what the
other's state was. CPUs could miss their own expiration and have to
wait for the slice expiration on the other CPU. Now, timeouts are
global objects with simple expiration times, and there's no need for
that function at all.
Signed-off-by: Andy Ross <andyross@google.com>
Some of the offset symbols that are derived from the macro
GEN_OFFSET_SYM() are not used anywhere in the Zephyr codebase.
Remove them as part of a cleanup effort.
Instances of an associated GEN_OFFSET_SYM() have also been
removed when the resulting macro is no longer referenced.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Some of the offset symbols generated via the macro GEN_OFFSET_SYM()
are not used anywhere in the Zephyr codebase. Remove them as part of
a cleanup effort.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Adds a routine to safely walk a specified wait queue and invoke a
custom callback function on each waiting thread.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The interrupt stack is used as the system stack during kernel
initialization while IRQs are not yet enabled. The sp register is
set to z_interrupt_stacks + CONFIG_ISR_STACK_SIZE.
CONFIG_ISR_STACK_SIZE only represents the desired usable stack size.
This does not take into account the added guard area. Result is a stack
whose pointer is much closer to the trigger zone than expected when
CONFIG_PMP_STACK_GUARD=y, and the SMP configuration in particular pushes
it over the edge during many CI test cases.
Worse: during early init we're not quite ready to handle exceptions
yet and complete havoc ensues with no meaningful debugging output.
Make sure the early assembly code locates the actual top of the stack
by generating a constant with its true size.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Fixes#46324
Set dummy_thread->base.slice_ticks to 0 when
CONFIG_TIMESLICE_PER_THREAD is set. To avoid
_current_cpu->slice_ticks be a big number.
Signed-off-by: Hu Zhenyu <zhenyu.hu@intel.com>
MISRA C:2012 Rule 14.4 (The controlling expression of an if statement
and the controlling expression of an iteration-statement shall have
essentially Boolean type.)
Use `bool' instead of `int' to represent Boolean values.
Use `do { ... } while (false)' instead of `do { ... } while (0)'.
Use comparisons with zero instead of implicitly testing integers.
This commit is a subset of the original commit:
5d02614e34a86b549c7707d3d9f0984bc3a5f22a
Signed-off-by: Simon Hein <SHein@baumer.com>
This commit updates all deprecated `K_KERNEL_PINNED_STACK_ARRAY_EXTERN`
macro usages to use the `K_KERNEL_PINNED_STACK_ARRAY_DECLARE` macro
instead.
Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
In order to bring consistency in-tree, migrate all kernel code to the
new prefix <zephyr/...>. Note that the conversion has been scripted,
refer to zephyrproject-rtos#45388 for more details.
Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
This adds lazy floating point context switching. On svc/irq entrance,
the VFP is disabled and a pointer to the exception stack frame is saved
away. If the esf pointer is still valid on exception exit, then no
other context used the VFP so the context is still valid and nothing
needs to be restored. If the esf pointer is NULL on exception exit,
then some other context used the VFP and the floating point context is
restored from the esf.
The undefined instruction handler is responsible for saving away the
floating point context if needed. If the handler is in the first
irq/svc context and the current thread uses the VFP, then the float
context needs to be saved. Also, if the handler is in a nested context
and the previous context was using the FVP, save the float context.
Signed-off-by: Bradley Bolen <bbolen@lexmark.com>
Instead of resizing all devices handles, we just resize devices that are
power domains. This means that a power domain has to be declared as
compatbile with "power-domain" in device tree node.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Zephyr's timeslice implementation has always been somewhat primitive.
You get a global timeslice that applies broadly to the whole bottom of
the priority space, with no ability (beyond that one priority
threshold) to tune it to work on certain threads, etc...
This adds an (optionally configurable) API that allows timeslicing to
be controlled on a per-thread basis: any thread at any priority can be
set to timeslice, for a configurable per-thread slice time, and at the
end of its slice a callback can be provided that can take action.
This allows the application to implement things like responsiveness
heuristics, "fair" scheduling algorithms, etc... without requiring
that facility in the core kernel.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Zeroing the BSS and copying data to RAM with regular memset/memcpy may
cause problems when those functions are assuming a fully initialized
system for their optimizations to work e.g. some instructions require
an active MMU, but turning the MMU on needs the .bss section to be
cleared first, etc.
Commit c5b898743a ("aarch64: Fix alignment fault on z_bss_zero()")
provides a detailed explanation of such a case.
Replacing z_bss_zero() with an architecture specific one is problematic
as the former may see new sections added to it that would be missed by
the later. The same reasoning goes for z_data_copy().
Let's make maintenance much easier by providing weak versions of
memset/memcpy that can be overridden by architecture-specific safe
versions when needed.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Extracting stack usage calculation from k_thread_stack_space_get to
z_stack_space_get so it can be used also for interrupt stack.
Signed-off-by: Krzysztof Chruscinski <krzysztof.chruscinski@nordicsemi.no>
There is no need to use conditional compilation for the function
prototypes in the kernel architecture header file. So remove it.
Added bouns is that these functions can appear in documentation
without explicitly enabled in pre-defines during doc build.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Extends the CPU usage runtime stats to track current, total, peak
and average usage (as bounded by the scheduling of the idle thread).
This permits a developer to obtain more system information if desired
to tune the system.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
This commit does two things to the z_sched_thread_usage(). First,
it updates the API so that it accepts a pointer to the runtime
stats instead of simply returning the usage cycles. This gives it
the flexibility to retrieve additional statistics in the future.
Second, the runtime stats are only updated if the specified thread
is the current thread running on the current core.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The resource pool of the short-lived dummy thread "stub" may be
inherited by other threads created during system initialization. This
commit initializes this resource pool to NULL or the system pool to
ensure that a well-defined resource pool propagates to other threads
that inherit it from the dummy thread.
Fixes#41482.
Signed-off-by: Berend Ozceri <berend@recogni.com>
Storing the state where this is the first GDB break can be done
in the main GDB stub code. There is no need to store the state
in architecture layer.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Remove LOG_MINIMAL kconfig option which was confusing
since LOG_MODE_MINIMAL existed. LOG_MINIMAL was used to
force minimal mode but because of invalid dependencies
it was leading to issues.
Refactored code to use LOG_MODE_MINIMAL everywhere and
renamed LOG_MINIMAL to LOG_DEFAULT_MINIMAL which has impact
on defualt logging mode (which still can be later changed
in conf file or in menuconfig).
Signed-off-by: Krzysztof Chruscinski <krzysztof.chruscinski@nordicsemi.no>
Clean up RUNTIME_STATS to separate the API from the individual data
backends. Use the SCHED_THREAD_USAGE tracking instead of the original
for execution_cycles. Move the kconfig for that into the runtime
stats menu, since it's part of the family now.
Also remove a lot of needless #if's around the declarations. Unused
structs and uncalled functions don't need to be explicitly hidden. An
attempt to access a non-existent field (e.g. "execution_cycles" if
that isn't configured) provides all the build time validation we need.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This is an alternate backend that does what THREAD_RUNTIME_STATS is
doing currently, but with a few advantages:
* Correctly synchronized: you can't race against a running thread
(potentially on another CPU!) while querying its usage.
* Realtime results: you get the right answer always, up to timer
precision, even if a thread has been running for a while
uninterrupted and hasn't updated its total.
* Portable, no need for per-architecture code at all for the simple
case. (It leverages the USE_SWITCH layer to do this, so won't work
on older architectures)
* Faster/smaller: minimizes use of 64 bit math; lower overhead in
thread struct (keeps the scratch "started" time in the CPU struct
instead). One 64 bit counter per thread and a 32 bit scratch
register in the CPU struct.
* Standalone. It's a core (but optional) scheduler feature, no
dependence on para-kernel configuration like the tracing
infrastructure.
* More precise: allows architectures to optionally call a trivial
zero-argument/no-result cdecl function out of interrupt entry to
avoid accounting for ISR runtime in thread totals. No configuration
needed here, if it's called then you get proper ISR accounting, and
if not you don't.
For right now, pending unification, it's added side-by-side with the
older API and left as a z_*() internal symbol.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Instead of returning PM_STATE_ACTIVE for when the cpu didn't enter a
low power state and a different state when it entered, but has
already left the state and is active again, it changes
pm_system_suspend to return true when the cpu has entered a low power
state and false otherwise.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
There was a brief (but seen in practice on real apps on real
hardware!) race with the switch-based z_swap() implementation. The
thread return value was being initialized to -EAGAIN after the
enclosing lock had been released. But that lock is supposed to be
atomic with the thread suspend.
This opened a window for another racing thread to come by and "wake
up" our pending thread (which is fine on its own), set its return
value (e.g. to 0 for success) and then have that value clobbered by
the thread continuing to suspend itself outside the lock.
Melodramatic aside: I continue to hate this
arch_thread_return_value_set() API; it needs to die. At best it's a
mild optimization on a handful of architectures (e.g. x86 implements
it by writing to the EAX register save slot in the context block).
Asynchronous APIs are almost always worse than synchronous ones, and
in this case it's an async operation that races against literal
context switch code that can't use traditional locking strategies.
Fixes#39575
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Some SMP applications have threading designs where every thread
created is always assigned to a specific CPU, and never want to
schedule them symmetrically across CPUs under any circumstance.
In this situation, it's possible to optimize the run queue design a
bit to put a separate queue in each CPU struct instead of having a
single global one. This is probably good for a few cycles per
scheduling event (maybe a bit more on architectures where cache
locality can be exploited) in circumstances where there is more than
one runnable thread. It's a mild optimization, but a basically simple
one.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Some architectures already returns -ENOTSUP when these functions
are called. So add this return value to the API doc.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Add a SOC API to allow for application control over deep idle power
states. Note that the hardware idle entry happens out of the WAITI
instruction, so the application has to be responsibile for ensuring
the CPU to be halted actually reaches idle deterministically. Lots of
warnings in the docs to this effect.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
During boot process, the boot sections need to be pinned in
memory to prevent them from being paged out (to avoid
pages being paged out and immediately paged in again).
Once the boot process is completed (just before calling main()),
the boot sections can be unpinned so the memory can be
used for demand paging for paging in data pages.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
z_smp_init() is only available if CONFIG_SMP is defined,
smp_timer_init() also depends on two Kconfig parameters. Also make it
conditional in cavs_timer.c. Also clarify some SMP-related comments
there.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
The z_interrupt_stacks was declared extern in the kernel internal
header file using the same macro which defines the same stack
array but with an added "extern" in front. This macro adds
alignment and section attribute which are actually not the same
as the actual stack array defined in kernel/init.c. The section
name used in the section attribute contains the file name where
the stack array is defined or extern declared. So the same
symbol, in this case z_interrupt_stacks, has different
attributes in two places, and GCC 11 starts to complain about
this. So use the newly introduced macro to extern declare
the stack array without adding/replacing any symbol attributes.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
These functions are those that need be implemented by backing
store outside kernel. Promote them from z_* so these can be
included in documentation.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
These functions and data structures are those that need
to be implemented by eviction algorithm and application
outside kernel. Promote them from z_* so these can be
included in documentation.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The scheduler has historically had an API where an application can
inform the kernel that it will never create a thread that can be
preempted, and the kernel and architecture layer would use that as an
optimization hint to eliminate some code paths.
Those optimizations have dwindled to almost nothing at this point, and
they're now objectively a smaller impact than the special casing that
was required to handle the idle thread (which, obviously, must always
be preemptible).
Fix this by eliminating the idea of "cooperative only" and ensuring
that there will always be at least one preemptible priority with value
>=0. CONFIG_NUM_PREEMPT_PRIORITIES now specifies the number of
user-accessible priorities other than the idle thread.
The only remaining workaround is that some older architectures (and
also SPARC) use the CONFIG_PREEMPT_ENABLED=n state as a hint to skip
thread switching on interrupt exit. So detect exactly those platforms
and implement a minimal workaround in the idle loop (basically "just
call swap()") instead, with a big explanation.
Note that this also fixes a bug in one of the philosophers samples,
where it would ask for 6 cooperative priorities but then use values -7
through -2. It was assuming the kernel would magically create a
cooperative priority for its idle thread, which wasn't correct even
before.
Fixes#34584
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Our z_swap() API takes a key returned from arch_irq_lock() and
releases it atomically with the context switch. Make sure that the
action of the unlocking is to unmask interrupts globally. If
interrupts would still be masked then that means there is an OUTER
interrupt lock still held, and the code that locked it surely doesn't
expect the thread to be suspended and interrupts unmasked while it's
held!
Unfortunately, this kind of mistake is very easy to make. We should
catch that with a simple assertion. This is essentially a crude
Zephyr equivalent of the extremely common "BUG: scheduling while
atomic" error in Linux drivers (just google it).
The one exception made is the circumstance where a thread has already
aborted itself. At that stage, whatever upthread lock state might
have existed will have already been messed up, so there's no value in
our asserting here. We can't catch all bugs, and this can actually
happen in error handling and/or test frameworks.
Fixes#33319
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This adds the necessary bits for linker scripts and source code
to specify which symbols need to be pinned in memory. This is
needed for demand paging as some functions and data must reside
in memory all the time and cannot be paged out (e.g. paging,
scheduler, and interrupt routines for functionality).
This is up to the arch/SoC/board to define the sections in
their linker scripts as the pinned section may need special
alignment which cannot be done in common script snippets.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This adds the necessary bits for linker scripts and source code
to specify which symbols are needed for boot process so they
can be grouped together.
One use of this is to group boot related code and data so these
won't interval with other kernel and application for better
caching.
This is a must for demand paging as some functions and data
must be available during the boot process and before the memory
manager is initialized. During this time, paging cannot be used
so symbols linked in virtual memory space are unavailable.
This is up to the arch/SoC/board to define the sections in
their linker scripts as section may need special alignment
which cannot be done in common script snippets.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>