SEGGER Ozone J-Trace Code Profile identified iterations over daint value as hot path. The iterations show at the very top of code profile because full iteration happens whenever there is any activity on endpoint. Optimize daint handling loops so only set bits are iterated over. While this optimization depends on find_lsb_set() efficiency, it seems to be worth it solely on the basis that quite often only few bits are set. After a bit deeper analysis, I was suprised that on ARM Cortex-M33 the find_lsb_set() approach is faster than naive iteration even if all bits are set (which is extreme case because USB applications are unlikely to use all 16 IN and 16 OUT endpoints simultaneously). This is due to fact that there is only one conditional jump CBNZ and find_lsb_set() - 1 translates to RBIT + CLZ and then clearing the bit uses LSL.W + BIC.W. Whereas the naive itation uses ADDS + CMP + BNE for the loop handling and also has LSR.W + LSLS + BPL (+ ADD.W instruction on each iteration to add 16 for OUT endpoints) for the continue check. Therefore the optimized code on ARM Cortex-M33 is never worse than naive iteration. Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no> |
||
|---|---|---|
| .. | ||
| nrf_usbd_common | ||
| CMakeLists.txt | ||
| Kconfig | ||
| usb_dwc2_hw.h | ||