< Terug naar vorige pagina

Publicatie

Structured precision skipping: Accelerating convolutional neural networks with budget-aware dynamic precision selection

Tijdschriftbijdrage - Tijdschriftartikel

Despite the remarkable advancement in various intelligence tasks achieved by Convolutional Neural Networks, the massive computation and storage consumption limit applications on resource-constrained devices. Existing works explore to reduce computation cost by leveraging the input-dependent redundancy at runtime. The irregular dynamic sparsity distribution, however, limits the real speedup for dynamic models deployed in traditional neural network accelerators. To solve this problem, we propose an algorithm-architecture co-design, named structured precision skipping (SPS), to exploit the dynamic precision redundancy in statically quantized models. SPS computes most neurons in a lower precision and only a small portion of important neurons in a higher precision to preserve performance. Specifically, we first propose the structured dynamic block to exploit the dynamic sparsity in a structured manner. Based on the block, we then apply a budget-aware training method by inducing a budget regularization to learn the precision skipping under a target resource constraint. Finally, we present an architecture design based on the bit-serial architecture with support for SPS models, where only a predict controller module with small overhead is introduced. Extensive evaluation results demonstrate that SPS can achieve up to 1.5× speedup and 1.4× energy saving on various models and datasets with marginal accuracy loss.
Tijdschrift: JOURNAL OF SYSTEMS ARCHITECTURE
ISSN: 1383-7621
Volume: 124
Pagina's: 102403
Jaar van publicatie:2022
Trefwoorden:Convolutional neural networks, Algorithm-architecture co-design, Model compression and acceleration, Dynamic quantization
Toegankelijkheid:Closed