< Back to previous page
Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution
Journal Contribution - Journal Article
Convolutional neural networks (CNNs) now also start to reach impressive performance on non-classification image processing tasks, such as denoising, demosaicing, super-resolution, and super slow motion. Consequently, CNNs are increasingly deployed on very high-resolution images. However, the resulting high-resolution feature maps pose unseen requirements on the memory system of neural network processing systems, as on-chip memories are too small to store high-resolution feature maps, while off-chip memories are very costly in terms of I/O bandwidth and power. This paper first shows that the classical layer-by-layer inference approaches are bounded in their external I/O bandwidth versus on-chip memory tradeoff space, making it infeasible to scale up to very high resolutions at a reasonable cost. Next, we demonstrate how an alternative depth-first network computation can reduce I/O bandwidth requirements up to >200× for a fixed on-chip memory size or, alternatively, reduce on-chip memory requirements up to >10000× for a fixed I/O bandwidth limitation. We further introduce an enhanced depth-first method, exploiting both line buffers and tiling, to further improve the external I/O bandwidth versus on-chip memory capacity tradeoff and quantify its improvements beyond the current state of the art.
Journal: IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Pages: 323 - 331
Number of pages: 9
Keywords:Electrical & electronic engineering