Improves context-switch performance. TLB invalidation and the nG bit are used conservatively. This could be improved in future work. Tested with tests/benchmarks/sched_userspace: BEFORE: ``` Swapping 2 threads: 161562583 cyc & 1000000 rounds -> 1615 ns per ctx Swapping 8 threads: 161569289 cyc & 1000000 rounds -> 1615 ns per ctx Swapping 16 threads: 161649163 cyc & 1000000 rounds -> 1616 ns per ctx Swapping 32 threads: 163487880 cyc & 1000000 rounds -> 1634 ns per ctx ``` AFTER: ``` Swapping 2 threads: 18129207 cyc & 1000000 rounds -> 181 ns per ctx Swapping 8 threads: 49702891 cyc & 1000000 rounds -> 497 ns per ctx Swapping 16 threads: 55898650 cyc & 1000000 rounds -> 558 ns per ctx Swapping 32 threads: 58059704 cyc & 1000000 rounds -> 580 ns per ctx ``` Signed-off-by: Henri Xavier <datacomos@huawei.com> |
||
|---|---|---|
| .. | ||
| core | ||
| include | ||
| CMakeLists.txt | ||
| Kconfig | ||