Commit Graph

339 Commits

Author SHA1 Message Date
James Harris
2cd0f66515 kernel: sched: change to 3-way thread priority comparison
`z_is_t1_higher_prio_than_t2` was being called twice in both the
context-switch fastpath and in `z_priq_rb_lessthan`, just to
dealing with priority ties. In addition, the API was error-prone
(and too much in the fastpath to be able to assert its invarients)
- see also #32710 for a previous example of this API breaking
and returning a>b but also b>a.

Replacing this with a direct 3-way comparison `z_cmp_t1_prio_with_t2`
sidesteps most of these issues. There is still a concern that
`sgn(z_cmp_t1_prio_with_t2(a,b)) != -sgn(z_cmp_t1_prio_with_t2(b,a))`
but I don't see any way to alleviate this aside from adding an
assert to the fastpath.

Signed-off-by: James Harris <james.harris@intel.com>
2021-03-02 14:27:14 -05:00
Andy Ross
6fb6d3cfbe kernel: Add new k_thread_abort()/k_thread_join()
Add a newer, much smaller and simpler implementation of abort and
join.  No need to involve the idle thread.  No need for a special code
path for self-abort.  Joining a thread and waiting for an aborting one
to terminate elsewhere share an implementation.  All work in both
calls happens under a single locked path with no unexpected
synchronization points.

This fixes a bug with the current implementation where the action of
z_sched_single_abort() was nonatomic, releasing the lock internally at
a point where the thread to be aborted could self-abort and confuse
the state such that it failed to abort at all.

Note that the arm32 and native_posix architectures, which have their
own thread abort implementations, now see a much simplified
"z_thread_abort()" internal API.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-24 16:39:15 -05:00
Andy Ross
6b84ab3830 kernel/sched: Adjust locking in z_swap()
Swap was originally written to use the scheduler lock just to select a
new thread, but it would be nice to be able to rely on scheduler
atomicity later in the process (in particular it would be nice if the
assignment to cpu.current could be seen atomically).  Rework the code
a bit so that swap takes the lock itself and holds it until just
before the call to arch_switch().

Note that the local interrupt mask has always been required to be held
across the swap, so extending the lock here has no effect on latency
at all on uniprocessor setups, and even on SMP only affects average
latency and not worst case.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-24 16:39:15 -05:00
Andrei Emeltchenko
377456c5af kernel: Move LOCKED() macro to kernel_internal.h
Remove duplication in the code by moving macro LOCKED() to the correct
kernel_internal.h header.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
2021-02-22 14:56:37 -05:00
Daniel Leung
ece9cad858 kernel: add CONFIG_SRAM_OFFSET
This adds a new kconfig CONFIG_SRAM_OFFSET to specify the offset
from beginning of SRAM where the kernel begins. On x86 and
PC compatible platforms, the first 1MB of RAM is reserved and
Zephyr should not link anything there. However, this 1MB still
needs to be mapped by the MMU to access various platform related
information. CONFIG_SRAM_OFFSET serves similar function as
CONFIG_KERNEL_VM_OFFSET and is needed for proper phys/virt
address translations.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2021-02-22 14:55:28 -05:00
Daniel Leung
ec21c0b92f kernel: mmu: fix boot address translation macros
The Z_BOOT_VIRT_TO_PHYS() and Z_BOOT_PHYS_TO_VIRT() address
translation macros are flipped in their calculations.
The calculation is supposed to be:

  virt = phys + ((KERNEL_VM_BASE + KERNEL_VM_OFFSET) -
                 SRAM_BASE_ADDRESS)

So fix the them.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2021-02-22 14:55:28 -05:00
Peter Bigot
d554d34137 device: add post-process of elf file to manage device handles
Following the idiom used for system calls, add script support to read
the initial application binary to identify which devices are defined,
and to use their offset in the device array as their unique handle
rather than the externally-defined ordinal from devicetree.  The
device dependency arrays are updated to use these handles.

Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
2021-02-19 15:46:16 -05:00
Peter Bigot
1cadd8b305 device: perform dynamic device initialization during system startup
Initialize all device objects in a batch before invoking any code that
might try to reference data in them.  This eliminates a race condition
enabled by the ability to resolve a device structure at build time,
and reference it from one device's initialization routine before the
device itself has been initialized.

While the device is pulled from the sys_init records rather than
static devices, all in-tree init_entry records that are associated
with devices are produced via Z_DEVICE_DEFINE(), so there should be no
static devices that would be missed by instead iterating over the
device records.

Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
2021-02-19 10:11:20 -05:00
Andy Ross
c7d0cb6641 include/kernel_arch_interface.h: Redocument arch_switch()
Some recent changes exposed some common "arch_switch() anti-patterns"
in various architectures.  The documentation technically described
this all correctly, but probably wasn't as clear as it should have
been.  Rewrite, making clear exactly what needs to happen and how the
fields should be interpreted.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-14 16:22:45 -05:00
Andy Ross
4ff457113e kernel/sched: Fix rare SMP deadlock
It was possible with pathological timing (see below) for the scheduler
to pick a cycle of threads on each CPU and enter the context switch
path on all of them simultaneously.

Example:
   * CPU0 is idle, CPU1 is running thread A
   * CPU1 makes high priority thread B runnable
   * CPU1 reaches a schedule point (or returns from an interrupt) and
     decides to run thread B instead
   * CPU0 simultaneously takes its IPI and returns, selecting thread A

Now both CPUs enter wait_for_switch() to spin, waiting for the context
switch code on the other thread to finish and mark the thread
runnable.  So we have a deadlock, each CPU is spinning waiting for the
other!

Actually, in practice this seems not to happen on existing hardware
platforms, it's only exercisable in emulation.  The reason is that the
hardware IPI time is much faster than the software paths required to
reach a schedule point or interrupt exit, so CPU1 always selects the
newly scheduled thread and no deadlock appears.  I tried for a bit to
make this happen with a cycle of three threads, but it's complicated
to get right and I still couldn't get the timing to hit correctly.  In
qemu, though, the IPI is implemented as a Unix signal sent to the
thread running the other CPU, which is far slower and opens the window
to see this happen.

The solution is simple enough: don't store the _current thread in the
run queue until we are on the tail end of the context switch path,
after wait_for_switch() and going to reach the end in guaranteed time.

Note that this requires changing a little logic to handle the yield
case: because we can no longer rely on _current's position in the run
queue to suppress it, we need to do the priority comparison directly
based on the existing "swap_ok" flag (which has always meant
"yielded", and maybe should be renamed).

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-14 16:22:45 -05:00
Andy Ross
91946ef21c kernel/sched: Refactor, unify management of QUEUED state
The QUEUED state flag was managed separately from the run queue
insertion/deletion, and the logic (while AFAICT perfectly correct) was
tangled in a few places trying to keep them in sync.  Put the
management of both behind a queue_thread()/dequeue_thread() API for
clarity.  The ALWAYS_INLINE usage seems to be working to get the
compiler to condense the resulting multiple assignments.  No behavior
change.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-14 16:22:45 -05:00
Andy Ross
dd43221540 kernel/sched: Fix race with switch handle
The "null out the switch handle and put it back" code in the swap
implementation is a holdover from some defensive coding (not wanting
to break the case where we picked our current thread), but it hides a
subtle SMP race: when that field goes NULL, another CPU that may have
selected that thread (which is to say, our current thread) as its next
to run will be spinning on that to detect when the field goes
non-NULL.  So it will get the signal to move on when we revert the
value, when clearly we are still running on the stack!

In practice this was found on x86 which poisons the switch context
such that it crashes instantly.

Instead, be firm about state and always set the switch handle of a
currently running thread to NULL immediately before it starts running:
right before entering arch_switch() and symmetrically on the interrupt
exit path.

Fixes #28105

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-14 16:22:45 -05:00
Andy Ross
1d51e888d8 kernel/z_swap: Remove on-stack dummy spinlock
The z_swap_unlocked() function used a dummy spinlock for simplicity.
But this runs afouls of checking for stack-resident spinlocks
(forbidden when KERNEL_COHERENCE is set).  And it's executing needless
code to release the lock anyway.  Replace with a compile time NULL,
which will improve performance, correctness and code size.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-11 14:47:40 -05:00
Andy Ross
604f0f44b6 kernel/sched: Add missing lock around waitq unpend calls
The two calls to unpend a thread from a wait queue were inexplicably*
unsynchronized, as James Harris discovered.  Rework them to call the
lowest level primities so we can wrap the process inside the scheduler
lock.

Fixes #32136

* I took a brief look.  What seems to have happened here is that these
  were originally synchronized via an implicit from an outer caller
  (remember the original Uniprocessor irq_lock() API is a recursive
  lock), and they were mostly implemented in terms of middle-level
  calls that were themselves locked.  So those got ported over to the
  newer spinlock but the outer wrapper layer got forgotten.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-02-10 07:43:18 -05:00
Daniel Leung
371752bce3 kernel: tls: align tdata/tbss sections in stack
This lets the linker tell us what kind of alignment is required
for both tdata and tbss data when copying them into stack.
If they are not aligned as expected by the toolchain, generated
code would be accessing incorrect location for thread variables.

Fixes #32015

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2021-02-07 23:28:43 -05:00
Nicolas Pitre
f9461d1ac4 mmu: fix ARM64 compilation by removing z_mapped_size usage
The linker script defines `z_mapped_size` as follows:

```
	z_mapped_size = z_mapped_end - z_mapped_start;
```

This is done with the belief that precomputed values at link time will
make the code smaller and faster.

On Aarch64, symbol values are relocated and loaded relative to the PC
as those are normally meant to be memory addresses.

Now if you have e.g. `CONFIG_SRAM_BASE_ADDRESS=0x2000000000` then
`z_mapped_size` might still have a reasonable value, say 0x59334.
But, when interpreted as an address, that's very very far from the PC
whose value is in the neighborhood of 0x2000000000. That overflows the
4GB relocation range:

```
kernel/libkernel.a(mmu.c.obj): in function `z_mem_manage_init':
kernel/mmu.c:527:(.text.z_mem_manage_init+0x1c):
relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
```

The solution is to define `Z_KERNEL_VIRT_SIZE` in terms of
`z_mapped_end - z_mapped_start` at the source code level. Given this
is used within loops that already start with `z_mapped_start` anyway,
the compiler is smart enough to combine the two occurrences and
dispense with a size counter, making the code effectively
slightly better for all while avoiding the Aarch64 relocation
overflow:

```
   text    data     bss     dec     hex filename
   1216       8  294936  296160   484e0 mmu.c.obj.arm64.before
   1212       8  294936  296156   484dc mmu.c.obj.arm64.after
   1110       8    9244   10362    287a mmu.c.obj.x86-64.before
   1106       8    9244   10358    2876 mmu.c.obj.x86-64.after
```

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
2021-02-05 17:19:56 -05:00
Andrew Boie
14c5d1f1f7 kernel: add CONFIG_ARCH_MAPS_ALL_RAM
Some arches like x86 need all memory mapped so that they can
fetch information placed arbitrarily by firmware, like ACPI
tables.

Ensure that if this is the case, the kernel won't accidentally
clobber it by thinking the relevant virtual memory is unused.
Otherwise this has no effect on page frame management.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Andrew Boie
c7be5dddda mmu: backing stores reserve page fault room
If we evict enough pages to completely fill the backing store,
through APIs like k_mem_map(), z_page_frame_evict(), or
z_mem_page_out(), this will produce a crash the next time we
try to handle a page fault.

The backing store now always reserves a free storage location
for actual page faults.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Andrew Boie
60d306642e kernel: add z_num_pagefaults_get()
Simple counter of number of successfully handled page faults by
the core kernel.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Andrew Boie
431b7c0fe5 kernel: add demand paging internal interfaces
APIs used by backing store and eviction algorithms.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Andrew Boie
a6eca9fab6 kernel: add demand paging arch interfaces
Architecture layer hooks for demand paging. See
doxygen for these API definitions for more details.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Andrew Boie
ecb25fec51 mmu: ensure gperf data is mapped
Page tables created at build time may not include the
gperf data at the very end of RAM. Ensure this is mapped
properly at runtime to work around this.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Andrew Boie
299a2cf62e mmu: arch_mem_map() may no longer fail
Pre-allocation of paging structures is now required, such that
no allocations are ever needed when mapping memory.

Instantiation of new memory domains may still require allocations
unless a common page table is used.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Andrew Boie
e35f179db3 kernel: add page frame management
Initialize the page frame ontology at boot and update it
when we do memory mappings.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Andrew Boie
73a3e05e40 kernel: add CONFIG_ARCH_HAS_RESERVED_PAGE_FRAMES
We will need this to run on x86 with PC-like hardware.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-23 19:47:23 -05:00
Peter Bigot
affa7a1c7e Revert "device: add post-process of elf file to manage device handles"
This reverts commit 40d3653758.

Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
2021-01-23 18:01:03 -05:00
Anas Nashif
db0732f11d Revert "kernel: add CONFIG_ARCH_HAS_RESERVED_PAGE_FRAMES"
This reverts commit 9d2ebfff58.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2021-01-22 08:39:45 -05:00
Anas Nashif
8e84eaf73e Revert "kernel: add page frame management"
This reverts commit 2ca5fb7e06.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2021-01-22 08:39:45 -05:00
Anas Nashif
a2ec139bf7 Revert "mmu: arch_mem_map() may no longer fail"
This reverts commit db56722729.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2021-01-22 08:39:45 -05:00
Anas Nashif
d887e078f9 Revert "mmu: ensure gperf data is mapped"
This reverts commit e9bfd64110.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2021-01-22 08:39:45 -05:00
Anas Nashif
65122b776a Revert "kernel: add demand paging arch interfaces"
This reverts commit b8ae437967.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2021-01-22 08:39:45 -05:00
Anas Nashif
cd0beca292 Revert "kernel: add demand paging internal interfaces"
This reverts commit 3e51a7a775.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2021-01-22 08:39:45 -05:00
Anas Nashif
c2c87c99c7 Revert "kernel: add z_num_pagefaults_get()"
This reverts commit d7e6bc3e84.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2021-01-22 08:39:45 -05:00
Anas Nashif
5e978d237c Revert "mmu: backing stores reserve page fault room"
This reverts commit 7a642f81ab.

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2021-01-22 08:39:45 -05:00
Andrew Boie
7a642f81ab mmu: backing stores reserve page fault room
If we evict enough pages to completely fill the backing store,
through APIs like k_mem_map(), z_page_frame_evict(), or
z_mem_page_out(), this will produce a crash the next time we
try to handle a page fault.

The backing store now always reserves a free storage location
for actual page faults.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-21 16:47:00 -05:00
Andrew Boie
d7e6bc3e84 kernel: add z_num_pagefaults_get()
Simple counter of number of successfully handled page faults by
the core kernel.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-21 16:47:00 -05:00
Andrew Boie
3e51a7a775 kernel: add demand paging internal interfaces
APIs used by backing store and eviction algorithms.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-21 16:47:00 -05:00
Andrew Boie
b8ae437967 kernel: add demand paging arch interfaces
Architecture layer hooks for demand paging. See
doxygen for these API definitions for more details.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-21 16:47:00 -05:00
Andrew Boie
e9bfd64110 mmu: ensure gperf data is mapped
Page tables created at build time may not include the
gperf data at the very end of RAM. Ensure this is mapped
properly at runtime to work around this.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-21 16:47:00 -05:00
Andrew Boie
db56722729 mmu: arch_mem_map() may no longer fail
Pre-allocation of paging structures is now required, such that
no allocations are ever needed when mapping memory.

Instantiation of new memory domains may still require allocations
unless a common page table is used.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-21 16:47:00 -05:00
Andrew Boie
2ca5fb7e06 kernel: add page frame management
Initialize the page frame ontology at boot and update it
when we do memory mappings.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-21 16:47:00 -05:00
Andrew Boie
9d2ebfff58 kernel: add CONFIG_ARCH_HAS_RESERVED_PAGE_FRAMES
We will need this to run on x86 with PC-like hardware.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2021-01-21 16:47:00 -05:00
Peter Bigot
40d3653758 device: add post-process of elf file to manage device handles
Following the idiom used for system calls, add script support to read
the initial application binary to identify which devices are defined,
and to use their offset in the device array as their unique handle
rather than the externally-defined ordinal from devicetree.  The
device dependency arrays are updated to use these handles.

Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
2021-01-21 14:49:04 -06:00
Daniel Leung
0c9f9691c4 kernel: mempool: add z_thread_aligned_alloc
This adds a new z_thread_aligned_alloc() to do memory allocation
with required alignment.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2021-01-13 09:43:55 -08:00
Andrew Boie
d2ad783a97 mmu: rename z_mem_map to z_phys_map
Renamed to make its semantics clearer; this function maps
*physical* memory addresses and is not equivalent to
posix mmap(), which might confuse people.

mem_map test case remains the same name as other memory
mapping scenarios will be added in the fullness of time.

Parameter names to z_phys_map adjusted slightly to be more
consistent with names used in other memory mapping functions.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
2020-12-16 08:55:55 -05:00
Anas Nashif
dd931f93a2 power: standarize PM Kconfigs and cleanup
- Remove SYS_ prefix
- shorten POWER_MANAGEMENT to just PM
- DEVICE_POWER_MANAGEMENT -> PM_DEVICE

and use PM_ as the prefix for all PM related Kconfigs

Signed-off-by: Anas Nashif <anas.nashif@intel.com>
2020-12-09 15:18:29 -05:00
Carlo Caione
a7d94b003e aarch64: Use absolute symbols for the callee saved registers
Use GEN_OFFSET_SYM macro to genarate absolute symbols for the
_callee_saved struct and use these new symbols in the assembly code.

Signed-off-by: Carlo Caione <ccaione@baylibre.com>
2020-11-17 18:59:23 -05:00
Daniel Leung
11e6b43090 tracing: roll thread switch in/out into thread stats functions
Since the tracing of thread being switched in/out has the same
instrumentation points, we can roll the tracing function calls
into the one for thread stats gathering functions.
This avoids duplicating code to call another function.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-11-11 23:55:49 -05:00
Daniel Leung
fc577c4bd1 kernel: gather basic thread runtime statistics
This adds the bits to gather the first thread runtime statictic:
thread execution time. It provides a rough idea of how much time
a thread is spent in active execution. Currently it is not being
used, pending following commits where it combines with the trace
points on context switch as they instrument the same locations.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-11-11 23:55:49 -05:00
Daniel Leung
02b20351cd kernel: add common bits to support TLS
This adds the common struct fields and functions to support
the implementation of thread local storage in individual
architecture. This uses the thread stack to store TLS data.

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
2020-10-24 10:52:00 -07:00