< Back to previous page

Project

Towards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning

In this research, the focus is on the the design of energy-efficient and flexible hardware architectures and hardware-software co-optimization strategies to enable early design space exploration of hardware architectures for (extreme)-edge-computing. The research first looks into the design of the highly specialized single hardware accelerator optimized for the application of object detection in drones. As the application and model to be accelerated are fixed, the hardware is optimized for mapping only convolutional and dense layers of a DL model in an object detection pipeline. Emerging DL applications deployed on the (extreme) edge devices, however, require multi-modal support, which demands, on the one hand, the need for much more flexible hardware accelerators and, on the other hand, complete standalone systems with the always-on and duty-cycled operation. Heterogeneity in hardware acceleration can enhance a system’s flexibility and energy efficiency by utilizing various energy-efficient hardware accelerators supporting multiple DL workloads on a single platform. With this motivation, the research presents a versatile all-digital heterogeneous multi-core system-on-chip with a highly flexible ML accelerator, a RISC-V core, non-volatile memory, and a power management unit. A highly energy-efficient heterogeneous multi-core system-on-chip is presented next by combining a digital and analog in-memory computing core controlled by a single RISC-V core. Increasing the core count further can benefit the performance of a system. However, data communication in multi-core platforms can quickly become a bottleneck if the design is not optimized. Classical network-on-chips (NoCs) have been extensively used in multi-core CPUs to address the data communication bottleneck. However, these NoCs use serial packet-based protocols suffering from significant protocol translation overheads toward the endpoints. In this research's final part, an open-source, fully AXI-compliant NoC fabric is proposed to address better the specific needs of multi-core DL computing platforms requiring significant burst-based communication. The NoC enables scaling DNN platforms to multi-accelerator systems, thus, allowing the journey toward high-performance heterogeneous multi-core systems.

Date:4 Sep 2018 →  2 May 2023
Keywords:Embedded Deep Learning Processors, Deep learning accelerators, latency-critical, Reconfigurable
Disciplines:Sensors, biosensors and smart sensors, Other electrical and electronic engineering, Nanotechnology, Design theories and methods
Project type:PhD project