This is intended for memory-constrained systems and will save
4K per thread, since we will no longer reserve room for or
activate a kernel stack guard page.
If CONFIG_USERSPACE is enabled, stack overflows will still be
caught in some situations:
1) User mode threads overflowing stack, since it crashes into the
kernel stack page
2) Supervisor mode threads overflowing stack, since the kernel
stack page is marked non-present for non-user threads
Stack overflows will not be caught:
1) When handling a system call
2) When the interrupt stack overflows
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This is an introductory port for Zephyr to be run as a Jailhouse
hypervisor[1]'s "inmate cell", on x86 64-bit CPUs (running on 32-bit
mode). This was tested with their "tiny-demo" inmate demo cell
configuration, which takes one of the CPUs of the QEMU-VM root cell
config, along with some RAM and serial controller access (it will even
do nice things like reserving some L3 cache for it via Intel CAT) and
Zephyr samples:
- hello_world
- philosophers
- synchronization
The final binary receives an additional boot sequence preamble that
conforms to Jailhouse's expectations (starts at 0x0 in real mode). It
will put the processor in 32-bit protected mode and then proceed to
Zephyr's __start function.
Testing it is just a matter of:
$ mmake -C samples/<sample_dir> BOARD=x86_jailhouse JAILHOUSE_QEMU_IMG_FILE=<path_to_image.qcow2> run
$ sudo insmod <path to jailhouse.ko>
$ sudo jailhouse enable <path to configs/qemu-x86.cell>
$ sudo jailhouse cell create <path to configs/tiny-demo.cell>
$ sudo mount -t 9p -o trans/virtio host /mnt
$ sudo jailhouse cell load tiny-demo /mnt/zephyr.bin
$ sudo jailhouse cell start tiny-demo
$ sudo jailhouse cell destroy tiny-demo
$ sudo jailhouse disable
$ sudo rmmod jailhouse
For the hello_world demo case, one should then get QEMU's serial port
output similar to:
"""
Created cell "tiny-demo"
Page pool usage after cell creation: mem 275/1480, remap 65607/131072
Cell "tiny-demo" can be loaded
CPU 3 received SIPI, vector 100
Started cell "tiny-demo"
***** BOOTING ZEPHYR OS v1.9.0 - BUILD: Sep 12 2017 20:03:22 *****
Hello World! x86
"""
Note that the Jailhouse's root cell *has to be started in xAPIC
mode* (kernel command line argument 'nox2apic') in order for this to
work. x2APIC support and its reasoning will come on a separate commit.
As a reminder, the make run target introduced for x86_jailhouse board
involves a root cell image with Jailhouse in it, to be launched and then
partitioned (with >= 2 64-bit CPUs in it).
Inmate cell configs with no JAILHOUSE_CELL_PASSIVE_COMMREG flag
set (e.g. apic-demo one) would need extra code in Zephyr to deal with
cell shutdown command responses from the hypervisor.
You may want to fine tune CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC for your
specific CPU—there is no detection from Zephyr with regard to that.
Other config differences from pristine QEMU defaults worth of mention
are:
- there is no HPET when running as Jailhouse guest. We use the LOAPIC
timer, instead
- there is no PIC_DISABLE, because there is no 8259A PIC when running
as a Jailhouse guest
- XIP makes no sense also when running as Jailhouse guest, and both
PHYS_RAM_ADDR/PHYS_LOAD_ADD are set to zero, what tiny-demo cell
config is set to
This opens up new possibilities for Zephyr, so that usages beyond just
MCUs come to the table. I see special demand coming from
functional-safety related use cases on industry, automotive, etc.
[1] https://github.com/siemens/jailhouse
Reference to Jailhouse's booting preamble code:
Origin: Jailhouse
License: BSD 2-Clause
URL: https://github.com/siemens/jailhouse
commit: 607251b44397666a3cbbf859d784dccf20aba016
Purpose: Dual-licensing of inmate lib code
Maintained-by: Zephyr
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>
We were unnecessarily pulling in headers which resulted in kernel.h
being pulled in, which is undesirable since arch/cpu.h pulls in
these headers.
Added integral type headers since we do need those.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
kernel.h depends on arch.h, and reverse dependencies need to be
removed. Define k_tid_t as some opaque pointer type so that arch.h
doesn't have to pull in kernel.h.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Created structures and unions needed to enable the software to
access these tables.
Also updated the helper macros to ease the usage of the MMU page
tables.
JIRA: ZEP-2511
Signed-off-by: Adithya Baglody <adithya.nagaraj.baglody@intel.com>
arg6 is treated as a memory constraint. If that memory
address was expressed as an operand to 'mov' in the generated
code as an offset from the stack pointer, then the 'push'
instruction immediately before it could end up causing memory 4
bytes off from what was intended being passed in as the 6th
argument.
Add ESP register to the clobber list to fix this issue.
Fixes issues observed with k_thread_create() passing in a
NULL argument list with CONFIG_DEBUG=y.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We need to track permission on stack memory regions like we do
with other kernel objects. We want stacks to live in a memory
area that is outside the scope of memory domain permission
management. We need to be able track what stacks are in use,
and what stacks may be used by user threads trying to call
k_thread_create().
Some special handling is needed because thread stacks appear as
variously-sized arrays of struct _k_thread_stack_element which is
just a char. We need the entire array to be considered an object,
but also properly handle arrays of stacks.
Validation of stacks also requires that the bounds of the stack
are not exceeded. Various approaches were considered. Storing
the size in some header region of the stack itself would not allow
the stack to live in 'noinit'. Having a stack object be a data
structure that points to the stack buffer would confound our
current APIs for declaring stacks as arrays or struct members.
In the end, the struct _k_object was extended to store this size.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
These needed "memory" clobbers otherwise the compiler would do
unnecessary optimizations for parameters passed in as pointer
values.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The compiler was complaining about impossible constraints since register
constraint was provided, but there are no general purpose registers left
available.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
A quick look at "man syscall" shows that in Linux, all architectures
support at least 6 argument system calls, with a few supporting 7. We
can at least do 6 in Zephyr.
x86 port modified to use EBP register to carry the 6th system call
argument.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
* Instead of a common system call entry function, we instead create a
table mapping system call ids to handler skeleton functions which are
invoked directly by the architecture code which receives the system
call.
* system call handler prototype specified. All but the most trivial
system calls will implement one of these. They validate all the
arguments, including verifying kernel/device object pointers, ensuring
that the calling thread has appropriate access to any memory buffers
passed in, and performing other parameter checks that the base system
call implementation does not check, or only checks with __ASSERT().
It's only possible to install a system call implementation directly
inside this table if the implementation has a return value and requires
no validation of any of its arguments.
A sample handler implementation for k_mutex_unlock() might look like:
u32_t _syscall_k_mutex_unlock(u32_t mutex_arg, u32_t arg2, u32_t arg3,
u32_t arg4, u32_t arg5, void *ssf)
{
struct k_mutex *mutex = (struct k_mutex *)mutex_arg;
_SYSCALL_ARG1;
_SYSCALL_IS_OBJ(mutex, K_OBJ_MUTEX, 0, ssf);
_SYSCALL_VERIFY(mutex->lock_count > 0, ssf);
_SYSCALL_VERIFY(mutex->owner == _current, ssf);
k_mutex_unlock(mutex);
return 0;
}
* the x86 port modified to work with the system call table instead of
calling a common handler function. fixed an issue where registers being
changed could confuse the compiler has been fixed; all registers, even
ones used for parameters, must be preserved across the system call.
* a new arch API for producing a kernel oops when validating system call
arguments added. The debug information reported will be from the system
call site and not inside the handler function.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
- _arch_user_mode_enter() implemented
- _arch_is_user_context() implemented
- _new_thread() will honor K_USER option if passed in
- System call triggering macros implemented
- _thread_entry_wrapper moved and now looks for the next function to
call in EDI
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
- There's no point in building up "validity" (declared volatile for some
strange reason), just exit with false return value if any of the page
directory or page table checks don't come out as expected
- The function was returning the opposite value as its documentation
(0 on success, -EPERM on failure). Documentation updated.
- This function will only be used to verify buffers from user-space.
There's no need for a flags parameter, the only option that needs to
be passed in is whether the buffer has write permissions or not.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This was felt to be necessary at one point but actually isn't.
- When a thread is initialized to use a particular stack, calls will be
made to the MMU/MPU to restrict access to that stack to only that
thread. Once a stack is in use, it will not be generally readable even
if the buffer exists in application memory space.
- If a user thread wants to create a thread, we will need to have some
way to ensure that whatever stack buffer passed in is unused and
appropriate. Since unused stacks in application memory will be generally
accessible, we can just check that the calling thread to
k_thread_create() has access to the stack buffer passed in, it won't if
the stack is in use.
On ARM we had a linker definition for .stacks, but currently stacks are
just tagged with __noinit (which is fine).
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Previously, this was only done if an essential thread self-exited,
and was a runtime check that generated a kernel panic.
Now if any thread has k_thread_abort() called on it, and that thread
is essential to the system operation, this check is made. It is now
an assertion.
_NANO_ERR_INVALID_TASK_EXIT checks and printouts removed since this
is now an assertion.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
All system calls made from userspace which involve pointers to kernel
objects (including device drivers) will need to have those pointers
validated; userspace should never be able to crash the kernel by passing
it garbage.
The actual validation with _k_object_validate() will be in the system
call receiver code, which doesn't exist yet.
- CONFIG_USERSPACE introduced. We are somewhat far away from having an
end-to-end implementation, but at least need a Kconfig symbol to
guard the incoming code with. Formal documentation doesn't exist yet
either, but will appear later down the road once the implementation is
mostly finalized.
- In the memory region for RAM, the data section has been moved last,
past bss and noinit. This ensures that inserting generated tables
with addresses of kernel objects does not change the addresses of
those objects (which would make the table invalid)
- The DWARF debug information in the generated ELF binary is parsed to
fetch the locations of all kernel objects and pass this to gperf to
create a perfect hash table of their memory addresses.
- The generated gperf code doesn't know that we are exclusively working
with memory addresses and uses memory inefficently. A post-processing
script process_gperf.py adjusts the generated code before it is
compiled to work with pointer values directly and not strings
containing them.
- _k_object_init() calls inserted into the init functions for the set of
kernel object types we are going to support so far
Issue: ZEP-2187
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Fix misspellings in .h files missed during code reviews
and affecting generated API documentation
Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
Kernel data size shifts in between linker passes due to the addition
of the page tables. We would like application memory bounds to
remain fixed so that we can program the MMU permissions for it
at build time.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Page faults will additionally dump out some interesting
page directory and page table flags for the faulting
memory address.
Intended to help determine whether the page tables have been
configured incorrectly as we enable memory protection features.
This only happens if CONFIG_EXCEPTION_DEBUG is turned on.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Historically, stacks were just character buffers and could be treated
as such if the user wanted to look inside the stack data, and also
declared as an array of the desired stack size.
This is no longer the case. Certain architectures will create a memory
region much larger to account for MPU/MMU guard pages. Unfortunately,
the kernel interfaces treat both the declared stack, and the valid
stack buffer within it as the same char * data type, even though these
absolutely cannot be used interchangeably.
We introduce an opaque k_thread_stack_t which gets instantiated by
K_THREAD_STACK_DECLARE(), this is no longer treated by the compiler
as a character pointer, even though it really is.
To access the real stack buffer within, the result of
K_THREAD_STACK_BUFFER() can be used, which will return a char * type.
This should catch a bunch of programming mistakes at build time:
- Declaring a character array outside of K_THREAD_STACK_DECLARE() and
passing it to K_THREAD_CREATE
- Directly examining the stack created by K_THREAD_STACK_DECLARE()
which is not actually the memory desired and may trigger a CPU
exception
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Subsequent patches will set this guard page as unmapped,
triggering a page fault on access. If this is due to
stack overflow, a double fault will be triggered,
which we are now capable of handling with a switch to
a know good stack.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We now create a special IA hardware task for handling
double faults. This has a known good stack so that if
the kernel tries to push stack data onto an unmapped page,
we don't triple-fault and reset the system.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We will need this for stack memory protection scenarios
where a writable GDT with Task State Segment descriptors
will be used. The addresses of the TSS segments cannot be
put in the GDT via preprocessor magic due to architecture
requirments that the address be split up into different
fields in the segment descriptor.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This has one use-case: configuring the double-fault #DF
exception handler to do an IA task switch to a special
IA task with a known good stack, such that we can dump
diagnostic information and then panic.
Will be used for stack overflow detection in kernel mode,
as otherwise the CPU will triple-fault and reset.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
A user space buffer must be validated before required operation
can proceed. This API will check the current MMU
configuration to determine if the buffer held by the user is valid.
Jira: ZEP-2326
Signed-off-by: Adithya Baglody <adithya.nagaraj.baglody@intel.com>
Different areas of memory will need to have different access
policies programmed into the MMU. We introduce MMU page alignment
to the following areas:
- The boundaries of the image "ROM" area
- The beginning of RAM representing kernel datas/bss/nonit
- The beginning of RAM representing app datas/bss/noinit
Some old alignment directives that are no longer necessary have
been removed.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Implements CONFIG_APPLICATION_MEMORY for x86. Working in
XIP and non-XIP configurations.
This patch does *not* implement any alignment constraints
imposed by the x86 MMU, such enabling will be done later.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
patch adds necessary files and does the modification to the existing
files to add device support for x86 based intel quark microcontroller
Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
This is unmaintained and currently has no known users. It was
added to support a Wind River project. If in the future we need it
again, we should re-introduce it with an exception-based mechanism
for catching out-of-bounds memory queries from the debugger.
The mem_safe subsystem is also removed, it is only used by the
GDB server. If its functionality is needed in the future, it
shoudl be replaced with an exception-based mechanism.
The _image_{ram, rom, text}_{start, end} linker variables have
been left in place, they will be re-purposed and expanded to
support memory protection.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Macro is used to create a structure to specify the boot time
page table configuration. Needed by the gen_mmu.py script to generate
the actual page tables.
Linker script is needed for the following:
1. To place the MMU page tables at 4KByte boundary.
2. To keep the configuration structure created by
the Macro(mentioned above).
Signed-off-by: Adithya Baglody <adithya.nagaraj.baglody@intel.com>
Inserting the IDT results in any data afterwards being shifted.
We want the memory addresses between the zephyr_prebuilt.elf
and zephyr.elf to be as close as possible. Insert some dummy
data in the linker script the same size as the gen_idt data
structures. Needed for forthcoming patches which generate MMU
page tables at build time.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
For various reasons its often necessary to generate certain
complex data structures at build-time by separate tools outside
of the C compiler. Data is populated to these tools by way of
special binary sections not intended to be included in the final
binary. We currently do this to generate interrupt tables, forthcoming
work will also use this to generate MMU page tables.
The way we have been doing this is to generatea "kernel_prebuilt.elf",
extract the metadata sections with objcopy, run the tool, and then
re-link the kernel with the extra data *and* use objcopy to pull
out the unwanted sections.
This doesn't scale well if multiple post-build steps are needed.
Now this is much simpler; in any Makefile, a special
GENERATED_KERNEL_OBJECT_FILES variable may be appended to containing
the filenames to the generated object files, which will be generated
by Make in the usual fashion.
Instead of using objcopy to pull out, we now create a linker-pass2.cmd
which additionally defines LINKER_PASS2. The source linker script
can #ifdef around this to use the special /DISCARD/ section target
to not include metadata sections in the final binary.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Historically, space for struct k_thread was always carved out of the
thread's stack region. However, we want more control on where this data
will reside; in memory protection scenarios the stack may only be used
for actual stack data and nothing else.
On some platforms (particularly ARM), including kernel_arch_data.h from
the toplevel kernel.h exposes intractable circular dependency issues.
We create a new per-arch header "kernel_arch_thread.h" with very limited
scope; it only defines the three data structures necessary to instantiate
the arch-specific bits of a struct k_thread.
Change-Id: I3a55b4ed4270512e58cf671f327bb033ad7f4a4f
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
We reserve a specific vector in the IDT to trigger when we want to
enter a fatal exception state from software.
Disabled for drivers/build_all tests as we were up to the ROM limit
on Quark D2000.
Issue: ZEP-843
Change-Id: I4de7f025fba0691d07bcc3b3f0925973834496a0
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Unlike assertions, these APIs are active at all times. The kernel will
treat these errors in the same way as fatal CPU exceptions. Ultimately,
the policy of what to do with these errors is implemented in
_SysFatalErrorHandler.
If the archtecture supports it, a real CPU exception can be triggered
which will provide a complete register dump and PC value when the
problem occurs. This will provide more helpful information than a fake
exception stack frame (_default_esf) passed to the arch-specific exception
handling code.
Issue: ZEP-843
Change-Id: I8f136905c05bb84772e1c5ed53b8e920d24eb6fd
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Convert code to use u{8,16,32,64}_t and s{8,16,32,64}_t instead of C99
integer types. This handles the remaining includes and kernel, plus
touching up various points that we skipped because of include
dependancies. We also convert the PRI printf formatters in the arch
code over to normal formatters.
Jira: ZEP-2051
Change-Id: Iecbb12601a3ee4ea936fd7ddea37788a645b08b0
Signed-off-by: Kumar Gala <kumar.gala@linaro.org>